]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
4 years agobatman-adv: Revert "disable ethtool link speed detection when auto negotiation off"
Sven Eckelmann [Mon, 25 Nov 2019 09:46:50 +0000 (10:46 +0100)]
batman-adv: Revert "disable ethtool link speed detection when auto negotiation off"

The commit 341ea390762f ("batman-adv: disable ethtool link speed detection
when auto negotiation off") disabled the usage of ethtool's link_ksetting
when auto negotation was enabled due to invalid values when used with
tun/tap virtual net_devices. According to the patch, automatic measurements
should be used for these kind of interfaces.

But there are major flaws with this argumentation:

* automatic measurements are not implemented
* auto negotiation has nothing to do with the validity of the retrieved
  values

The first point has to be fixed by a longer patch series. The "validity"
part of the second point must be addressed in the same patch series by
dropping the usage of ethtool's link_ksetting (thus always doing automatic
measurements over ethernet).

Drop the patch again to have more default values for various net_device
types/configurations. The user can still overwrite them using the
batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE.

Reported-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
4 years agobatman-adv: use rcu_replace_pointer() where appropriate
Antonio Quartulli [Wed, 20 May 2020 08:41:40 +0000 (10:41 +0200)]
batman-adv: use rcu_replace_pointer() where appropriate

In commit 2383fc643ad2 ("rcu: Upgrade rcu_swap_protected() to
rcu_replace_pointer()") a new helper macro named rcu_replace_pointer() was
introduced to simplify code requiring to switch an rcu pointer to a new
value while extracting the old one.

Use rcu_replace_pointer() where appropriate to make code slimer.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
4 years agobatman-adv: Revert "Drop lockdep.h include for soft-interface.c"
Sven Eckelmann [Wed, 6 May 2020 20:13:30 +0000 (22:13 +0200)]
batman-adv: Revert "Drop lockdep.h include for soft-interface.c"

The commit ea846a7330c3 ("net: partially revert dynamic lockdep key
changes") reverts the commit 6e290b1b3976 ("net: core: add generic lockdep
keys"). But it forgot to also revert the commit f68bdccf9d77 ("batman-adv:
Drop lockdep.h include for soft-interface.c") which depends on the latter.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
4 years agonet: partially revert dynamic lockdep key changes
Cong Wang [Sun, 3 May 2020 05:22:19 +0000 (22:22 -0700)]
net: partially revert dynamic lockdep key changes

This patch reverts the folowing commits:

commit d392d5dfc5130ce8ed88ee9aceba4d7228e19bf6
"bonding: add missing netdev_update_lockdep_key()"

commit d0540fbe26386f3fa8acfaa4d995b68d37b0f059
"net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"

commit 121463dd78a9149fff18d8e01512932e5e7bad68
"net: fix kernel-doc warning in <linux/netdevice.h>"

commit 6e290b1b3976f13ab47be199689863892bf41cda
"net: core: add generic lockdep keys"

but keeps the addr_list_lock_key because we still lock
addr_list_lock nestedly on stack devices, unlikely xmit_lock
this is safe because we don't take addr_list_lock on any fast
path.

Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-ethernet-ti-k3-introduce-common-platform-time-sync-driver-cpts'
David S. Miller [Mon, 4 May 2020 19:02:03 +0000 (12:02 -0700)]
Merge branch 'net-ethernet-ti-k3-introduce-common-platform-time-sync-driver-cpts'

Grygorii Strashko says:

====================
net: ethernet: ti: k3: introduce common platform time sync driver - cpts

This series introduced support for significantly upgraded TI A65x/J721E Common
platform time sync (CPTS) modules which are part of AM65xx Time Synchronization
Architecture [1].
The TI A65x/J721E now contain more than one CPTS instance:
- MCU CPSW CPTS (IEEE 1588 compliant)
- Main NAVSS CPTS (central)
- PCIe CPTS(s) (PTM  compliant)
- J721E: Main CPSW9g CPTS (IEEE 1588 compliant)
which can work as separately as interact to each other through Time Sync Router
(TSR) and Compare Event Router (CER). In addition there are also ICSS-G IEP
blocks which can perform similar timsync functions, but require FW support.
More info also available in TRM [2][3]. Not all above modules are available
to the Linux by as of now as some of them are reserved for RTOS/FW purposes.

The scope of this submission is TI A65x/J721E CPSW CPTS and Main NAVSS CPTS,
and TSR was used for testing purposes.
                                                                       +---------------------------+
                                                                       | MCU CPSW                  |
+-------------------+           +------------------------+             |                TS         |
| Main Navss CPTS   |           | Time Sync Router (TSR) |             |          +-------------+  |
|                   |           |                        |             |          |             |  |
|            HW1_TS +<----------+                        |             | +--------v-----+    +--+--+
|                   |           |                        |             | |        CPTS  |    |Port |
|              ...  |           |                        |           X+-->HW1_TS        |    |     |
|            HW8_TS <------------<---------+             |           X|-->HW2_TS        |    +--^--+
|                   |           |          |             +--------------->HW3_TS        |       |  |
|                   |           |          |             +--------------->HW4_TS        |       |  |
|                   |           |          |             |             | |              |       |  |
|                   |           |          |             |             | |              |       |  |
|            Genf0  +----------->          (A)---------+ +<--------------+Genf0         |       |  |
|                   |           |          |             |             | |              |       |  |
|              ...  |           |          +-----------> <---------------+Genf1     ESTf+-------+  |
|                   |           |                        |             | |              |          |
|                   |           |                        |             | +--------------+          |
|            Genf8  +---------->+                        |             |                           |
|                   |           |    SYNC0 ...    SYNC3  |             |                           |
+-------------------+           +------+------------+----+             +---------------------------+
                                       +            +
                                       X            X
(A) shows possible routing path for MCU CPSW CPTS Genf0 signal as an example.

Main features of the new TI A65x/J721E CPTS modules are:
- 64-bit timestamp/counter mode support in ns by using add_val
- implemented in HW PPM and nudge adjustment.
- control of time sync events via interrupt or polling
- selection of multiple external reference clock sources
- hardware timestamp of ext. inputs events (HWx_TS_PUSH)
- periodic generator function outputs (TS_GENFx)
- (CPSW only) Ethernet Enhanced Scheduled Traffic Operations (CPTS_ESTFn),
  which drives TSN schedule
- timestamping of all RX packets bypassing CPTS FIFO

Patch 1 - DT bindings
Patch 2 - the AM65x/J721E driver
Patch 3 - enables packet timestamping support in TI AM65x/J721E MCU CPSW driver.
Patches 4-7 - DT updates.

=== PTP Testing:

phc2sys -s CLOCK_REALTIME -c eth0 -m -O 0 -u30
phc2sys[627.331]: eth0 rms 409912446712787392 max 1587584079521858304 freq  -6665 +/- 35040 delay   832 +/-  27
phc2sys[657.335]: eth0 rms   33 max   66 freq     -0 +/-  28 delay   820 +/-  30
phc2sys[687.339]: eth0 rms   37 max   70 freq     -1 +/-  32 delay   830 +/-  29
phc2sys[717.343]: eth0 rms   33 max   71 freq     -0 +/-  29 delay   828 +/-  23
phc2sys[747.346]: eth0 rms   35 max   75 freq     -0 +/-  31 delay   829 +/-  26
phc2sys[777.350]: eth0 rms   37 max   68 freq     -1 +/-  32 delay   825 +/-  25
phc2sys[807.354]: eth0 rms   28 max   57 freq     -1 +/-  25 delay   824 +/-  21
phc2sys[837.358]: eth0 rms   43 max   81 freq     -1 +/-  37 delay   836 +/-  23
phc2sys[867.361]: eth0 rms   33 max   74 freq     +0 +/-  29 delay   828 +/-  24
phc2sys[897.365]: eth0 rms   35 max   77 freq     -2 +/-  30 delay   824 +/-  25
phc2sys[927.369]: eth0 rms   28 max   50 freq     +0 +/-  25 delay   825 +/-  25

ptp4l -P -2 -H -i eth0 -l 6 -m -q -p /dev/ptp1 -f ptp.cfg -s
ptp4l[22095.754]: port 1: MASTER to UNCALIBRATED on RS_SLAVE
ptp4l[22097.754]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[22159.757]: rms  317 max 1418 freq    +79 +/- 186 delay   410 +/-   1
ptp4l[22223.760]: rms    9 max   24 freq    +42 +/-  12 delay   409 +/-   1
ptp4l[22287.763]: rms   10 max   28 freq    +41 +/-  11 delay   410 +/-   1
ptp4l[22351.767]: rms   10 max   26 freq    +34 +/-  12 delay   410 +/-   1
ptp4l[22415.770]: rms   10 max   26 freq    +49 +/-  14 delay   410 +/-   1

=== Ext. HW_TS and Genf testing:

For testing purposes Time Sync Router (TSR) can be modeled in DT as pin controller
+       timesync_router: timesync_router@A40000 {
+               compatible = "pinctrl-single";
+               reg = <0x0 0xA40000 0x0 0x800>;
+               #address-cells = <1>;
+               #size-cells = <0>;
+               #pinctrl-cells = <1>;
+               pinctrl-single,register-width = <32>;
+               pinctrl-single,function-mask = <0x800007ff>;
+       };

then signals routing can be done in board file, for example:
+#define TS_OFFSET(pa, val)     (0x4+(pa)*4) (0x80000000 | val)
+
+&timesync_router {
+       pinctrl-names = "default";
+       pinctrl-0 = <&mcu_cpts>;
+
+       /* Example of the timesync routing */
+       mcu_cpts: mcu_cpts {
+               pinctrl-single,pins = <
+                       /* [cpts genf1] in13 -> out25 [cpts hw4_push] */
+                       TS_OFFSET(25, 13)
+                       /* [cpts genf1] in13 -> out0 [main cpts hw1_push] */
+                       TS_OFFSET(0, 13)
+                       /* [main cpts genf0] in4 -> out1 [main cpts hw2_push] */
+                       TS_OFFSET(1, 4)
+                       /* [main cpts genf0] in4 -> out24 [cpts hw3_push] */
+                       TS_OFFSET(24, 4)
+               >;
+       };
+};

will create link:
    cpsw cpts Genf1 -> main cpts hw1_push
                    -> cpsw cpts hw4_push

    main cpts Genf0 -> main cpts hw2_push
                    -> cpsw cpts hw3_push

 testptp -d /dev/ptp0 -i 0 -p 1000000000
 periodic output request okay
 testptp -d /dev/ptp0 -i 1 -e 5
 external time stamp request okay
 event index 1 at 22583.000000025
 event index 1 at 22584.000000025
 event index 1 at 22585.000000025
 event index 1 at 22586.000000025
 event index 1 at 22587.000000025
 testptp -d /dev/ptp1 -i 2 -e 5
 external time stamp request okay
 event index 2 at 1587606764.249304554
 event index 2 at 1587606765.249304467
 event index 2 at 1587606766.249304380
 event index 2 at 1587606767.249304293
 event index 2 at 1587606768.249304206

[1] https://www.ti.com/lit/pdf/spracp7
[2] https://www.ti.com/lit/pdf/sprz452
[3] https://www.ti.com/lit/pdf/spruil1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoarm64: dts: ti: j721e-main: add main navss cpts node
Grygorii Strashko [Fri, 1 May 2020 20:50:11 +0000 (23:50 +0300)]
arm64: dts: ti: j721e-main: add main navss cpts node

Add DT node for Main NAVSS CPTS module.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoarm64: dts: ti: k3-j721e-mcu: add mcu cpsw cpts node
Grygorii Strashko [Fri, 1 May 2020 20:50:10 +0000 (23:50 +0300)]
arm64: dts: ti: k3-j721e-mcu: add mcu cpsw cpts node

Add DT node for The TI J721E MCU CPSW CPTS which is part of MCU CPSW NUSS.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoarm64: dts: ti: k3-am65-main: add main navss cpts node
Grygorii Strashko [Fri, 1 May 2020 20:50:09 +0000 (23:50 +0300)]
arm64: dts: ti: k3-am65-main: add main navss cpts node

Add DT node for Main NAVSS CPTS module.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoarm64: dts: ti: k3-am65-mcu: add cpsw cpts node
Grygorii Strashko [Fri, 1 May 2020 20:50:08 +0000 (23:50 +0300)]
arm64: dts: ti: k3-am65-mcu: add cpsw cpts node

Add DT node for the TI AM65x SoC Common Platform Time Sync (CPTS).

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support
Grygorii Strashko [Fri, 1 May 2020 20:50:07 +0000 (23:50 +0300)]
net: ethernet: ti: am65-cpsw-nuss: enable packet timestamping support

The MCU CPSW Common Platform Time Sync (CPTS) provides possibility to
timestamp TX PTP packets and all RX packets.

This enables corresponding support in TI AM65x/J721E MCU CPSW driver.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: ti: introduce am654 common platform time sync driver
Grygorii Strashko [Fri, 1 May 2020 20:50:06 +0000 (23:50 +0300)]
net: ethernet: ti: introduce am654 common platform time sync driver

The CPTS module is used to facilitate host control of time sync operations.
Main features of CPTS module are:
- selection of multiple external clock sources
- control of time sync events via interrupt or polling
- 64-bit timestamp mode in ns with HW PPM and nudge adjustment.
- hardware timestamp ext. inputs (HWx_TS_PUSH)
- timestamp Generator function outputs (TS_GENFx)
Depending on integration it enables compliance with the IEEE 1588-2008
standard for a precision clock synchronization protocol, Ethernet Enhanced
Scheduled Traffic Operations (CPTS_ESTFn) and PCIe Subsystem Precision Time
Measurement (PTM).

Introduced driver provides Linux PTP hardware clock for each CPTS device
and network packets timestamping where applicable. CPTS PTP hardware clock
supports following operations:
    - Set time
    - Get time
    - Shift the clock by a given offset atomically
    - Adjust clock frequency
    - Time stamp external events
    - Periodic output signals

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-binding: ti: am65x: document common platform time sync cpts module
Grygorii Strashko [Fri, 1 May 2020 20:50:05 +0000 (23:50 +0300)]
dt-binding: ti: am65x: document common platform time sync cpts module

Document device tree bindings for TI AM654/J721E SoC The Common Platform
Time Sync (CPTS) module. The CPTS module is used to facilitate host control
of time sync operations. Main features of CPTS module are:
  - selection of multiple external clock sources
  - 64-bit timestamp mode in ns with ppm and nudge adjustment.
  - control of time sync events via interrupt or polling
  - hardware timestamp of ext. events (HWx_TS_PUSH)
  - periodic generator function outputs (TS_GENFx)
  - PPS in combination with timesync router
  - Depending on integration it enables compliance with the IEEE 1588-2008
standard for a precision clock synchronization protocol, Ethernet Enhanced
Scheduled Traffic Operations (CPTS_ESTFn) and PCIe Subsystem Precision Time
Measurement (PTM).

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'devlink-kernel-region-snapshot-id-allocation'
David S. Miller [Mon, 4 May 2020 18:58:31 +0000 (11:58 -0700)]
Merge branch 'devlink-kernel-region-snapshot-id-allocation'

Jakub Kicinski says:

====================
devlink: kernel region snapshot id allocation

currently users have to find a free snapshot id to pass
to the kernel when they are requesting a snapshot to be
taken.

This set extends the kernel so it can allocate the id
on its own and send it back to user space in a response.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodocs: devlink: clarify the scope of snapshot id
Jakub Kicinski [Fri, 1 May 2020 16:40:42 +0000 (09:40 -0700)]
docs: devlink: clarify the scope of snapshot id

In past discussions Jiri explained snapshot ids are cross-region.
Explain this in the docs.

v3: new patch

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodevlink: let kernel allocate region snapshot id
Jakub Kicinski [Fri, 1 May 2020 16:40:41 +0000 (09:40 -0700)]
devlink: let kernel allocate region snapshot id

Currently users have to choose a free snapshot id before
calling DEVLINK_CMD_REGION_NEW. This is potentially racy
and inconvenient.

Make the DEVLINK_ATTR_REGION_SNAPSHOT_ID optional and try
to allocate id automatically. Send a message back to the
caller with the snapshot info.

Example use:
$ devlink region new netdevsim/netdevsim1/dummy
netdevsim/netdevsim1/dummy: snapshot 1

$ id=$(devlink -j region new netdevsim/netdevsim1/dummy | \
       jq '.[][][][]')
$ devlink region dump netdevsim/netdevsim1/dummy snapshot $id
[...]
$ devlink region del netdevsim/netdevsim1/dummy snapshot $id

v4:
 - inline the notification code
v3:
 - send the notification only once snapshot creation completed.
v2:
 - don't wrap the line containing extack;
 - add a few sentences to the docs.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodevlink: factor out building a snapshot notification
Jakub Kicinski [Fri, 1 May 2020 16:40:40 +0000 (09:40 -0700)]
devlink: factor out building a snapshot notification

We'll need to send snapshot info back on the socket
which requested a snapshot to be created. Factor out
constructing a snapshot description from the broadcast
notification code.

v3: new patch

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet_sched: sch_fq: add horizon attribute
Eric Dumazet [Fri, 1 May 2020 14:07:41 +0000 (07:07 -0700)]
net_sched: sch_fq: add horizon attribute

QUIC servers would like to use SO_TXTIME, without having CAP_NET_ADMIN,
to efficiently pace UDP packets.

As far as sch_fq is concerned, we need to add safety checks, so
that a buggy application does not fill the qdisc with packets
having delivery time far in the future.

This patch adds a configurable horizon (default: 10 seconds),
and a configurable policy when a packet is beyond the horizon
at enqueue() time:
- either drop the packet (default policy)
- or cap its delivery time to the horizon.

$ tc -s -d qd sh dev eth0
qdisc fq 8022: root refcnt 257 limit 10000p flow_limit 100p buckets 1024
 orphan_mask 1023 quantum 10Kb initial_quantum 51160b low_rate_threshold 550Kbit
 refill_delay 40.0ms timer_slack 10.000us horizon 10.000s
 Sent 1234215879 bytes 837099 pkt (dropped 21, overlimits 0 requeues 6)
 backlog 0b 0p requeues 6
  flows 1191 (inactive 1177 throttled 0)
  gc 0 highprio 0 throttled 692 latency 11.480us
  pkts_too_long 0 alloc_errors 0 horizon_drops 21 horizon_caps 0

v2: fixed an overflow on 32bit kernels in fq_init(), reported
    by kbuild test robot <lkp@intel.com>

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: sched: fallback to qdisc noqueue if default qdisc setup fail
Jesper Dangaard Brouer [Thu, 30 Apr 2020 11:42:22 +0000 (13:42 +0200)]
net: sched: fallback to qdisc noqueue if default qdisc setup fail

Currently if the default qdisc setup/init fails, the device ends up with
qdisc "noop", which causes all TX packets to get dropped.

With the introduction of sysctl net/core/default_qdisc it is possible
to change the default qdisc to be more advanced, which opens for the
possibility that Qdisc_ops->init() can fail.

This patch detect these kind of failures, and choose to fallback to
qdisc "noqueue", which is so simple that its init call will not fail.
This allows the interface to continue functioning.

V2:
As this also captures memory failures, which are transient, the
device is not kept in IFF_NO_QUEUE state.  This allows the net_device
to retry to default qdisc assignment.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-ipa-I-O-map-SMEM-and-IMEM'
David S. Miller [Mon, 4 May 2020 18:26:55 +0000 (11:26 -0700)]
Merge branch 'net-ipa-I-O-map-SMEM-and-IMEM'

Alex Elder says:

====================
net: ipa: I/O map SMEM and IMEM

This series adds the definition of two memory regions that must be
mapped for IPA to access through an SMMU.  It requires the SMMU to
be defined in the IPA node in the SoC's Device Tree file.

There is no change since version 1 to the content of the code in
these patches, *however* this time the first patch is an update to
the binding definition rather than an update to a DTS file.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ipa: define SMEM memory region for IPA
Alex Elder [Mon, 4 May 2020 17:58:59 +0000 (12:58 -0500)]
net: ipa: define SMEM memory region for IPA

Arrange to use an item from SMEM memory for IPA.  SMEM item number
497 is designated to be used by the IPA.  Specify the item ID and
size of the region in platform configuration data.  Allocate and get
a pointer to this region from ipa_mem_init().  The memory must be
mapped for access through an SMMU.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ipa: define IMEM memory region for IPA
Alex Elder [Mon, 4 May 2020 17:58:58 +0000 (12:58 -0500)]
net: ipa: define IMEM memory region for IPA

Define a region of IMEM memory available for use by IPA in the
platform configuration data.  Initialize it from ipa_mem_init().
The memory must be mapped for access through an SMMU.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ipa: redefine struct ipa_mem_data
Alex Elder [Mon, 4 May 2020 17:58:57 +0000 (12:58 -0500)]
net: ipa: redefine struct ipa_mem_data

The ipa_mem_data structure type was never actually used.  Instead,
the IPA memory regions were defined using the ipa_mem structure.

Redefine struct ipa_mem_data so it encapsulates the array of IPA-local
memory region descriptors along with the count of entries in that
array.  Pass just an ipa_mem structure pointer to ipa_mem_init().

Rename the ipa_mem_data[] array ipa_mem_local_data[] to emphasize
that the memory regions it defines are IPA-local memory.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agodt-bindings: net: add IPA iommus property
Alex Elder [Mon, 4 May 2020 17:58:56 +0000 (12:58 -0500)]
dt-bindings: net: add IPA iommus property

The IPA accesses "IMEM" and main system memory through an SMMU, so
its DT node requires an iommus property to define range of stream IDs
it uses.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-add-helper-eth_hw_addr_crc'
David S. Miller [Mon, 4 May 2020 18:19:58 +0000 (11:19 -0700)]
Merge branch 'net-add-helper-eth_hw_addr_crc'

Heiner Kallweit says:

====================
net: add helper eth_hw_addr_crc

Several drivers use the same code as basis for filter hashes. Therefore
let's factor it out to a helper. This way drivers don't have to access
struct netdev_hw_addr internals.

First user is r8169.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agor8169: use new helper eth_hw_addr_crc
Heiner Kallweit [Mon, 4 May 2020 17:28:21 +0000 (19:28 +0200)]
r8169: use new helper eth_hw_addr_crc

Use new helper eth_hw_addr_crc to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: add helper eth_hw_addr_crc
Heiner Kallweit [Mon, 4 May 2020 17:27:00 +0000 (19:27 +0200)]
net: add helper eth_hw_addr_crc

Several drivers use the same code as basis for filter hashes. Therefore
let's factor it out to a helper. This way drivers don't have to access
struct netdev_hw_addr internals.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: dsa: felix: allow the device to be disabled
Michael Walle [Mon, 4 May 2020 16:52:28 +0000 (18:52 +0200)]
net: dsa: felix: allow the device to be disabled

If there is no specific configuration of the felix switch in the device
tree, but only the default configuration (ie. given by the SoCs dtsi
file), the probe fails because no CPU port has been set. On the other
hand you cannot set a default CPU port because that depends on the
actual board using the switch.

[    2.701300] DSA: tree 0 has no CPU port
[    2.705167] mscc_felix 0000:00:00.5: Failed to register DSA switch: -22
[    2.711844] mscc_felix: probe of 0000:00:00.5 failed with error -22

Thus let the device tree disable this device entirely, like it is also
done with the enetc driver of the same SoC.

Signed-off-by: Michael Walle <michael@walle.cc>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-smc-add-failover-processing'
David S. Miller [Mon, 4 May 2020 17:54:40 +0000 (10:54 -0700)]
Merge branch 'net-smc-add-failover-processing'

Karsten Graul says:

====================
net/smc: add failover processing

This patch series adds the actual SMC-R link failover processing and
improved link group termination. There will be one more (very small)
series after this which will complete the SMC-R link failover support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: save SMC-R peer link_uid
Karsten Graul [Mon, 4 May 2020 12:18:48 +0000 (14:18 +0200)]
net/smc: save SMC-R peer link_uid

During SMC-R link establishment the peers exchange the link_uid that
is used for debugging purposes. Save the peer link_uid in smc_link so it
can be retrieved by the smc_diag netlink interface.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: create improved SMC-R link_uid
Karsten Graul [Mon, 4 May 2020 12:18:47 +0000 (14:18 +0200)]
net/smc: create improved SMC-R link_uid

The link_uid of an SMC-R link is exchanged between SMC peers and its
value can be used for debugging purposes. Create a unique link_uid
during link initialization and use it in communication with SMC-R peers.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: improve termination processing
Karsten Graul [Mon, 4 May 2020 12:18:46 +0000 (14:18 +0200)]
net/smc: improve termination processing

Add helper smcr_lgr_link_deactivate_all() and eliminate duplicate code.
In smc_lgr_free(), clear the smc-r links before smc_lgr_free_bufs() is
called so buffers are already prepared for free. The usage of the soft
parameter in __smc_lgr_terminate() is no longer needed, smc_lgr_free()
can be called directly. smc_lgr_terminate_sched() and
smc_smcd_terminate() set lgr->freeing to indicate that the link group
will be freed soon to avoid unnecessary schedules of the free worker.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: add termination reason and handle LLC protocol violation
Karsten Graul [Mon, 4 May 2020 12:18:45 +0000 (14:18 +0200)]
net/smc: add termination reason and handle LLC protocol violation

Allow to set the reason code for the link group termination, and set
meaningful values before termination processing is triggered. This
reason code is sent to the peer in the final delete link message.
When the LLC request or response layer receives a message type that was
not handled, drop a warning and terminate the link group.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: asymmetric link tagging
Karsten Graul [Mon, 4 May 2020 12:18:44 +0000 (14:18 +0200)]
net/smc: asymmetric link tagging

New connections must not be assigned to asymmetric links. Add asymmetric
link tagging using new link variable link_is_asym. The new helpers
smcr_lgr_set_type() and smcr_lgr_set_type_asym() are called to set the
state of the link group, and tag all links accordingly.
smcr_lgr_conn_assign_link() respects the link tagging and will not
assign new connections to links tagged as asymmetric link.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: assign link to a new connection
Karsten Graul [Mon, 4 May 2020 12:18:43 +0000 (14:18 +0200)]
net/smc: assign link to a new connection

For new connections, assign a link from the link group, using some
simple load balancing.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: send DELETE_LINK, ALL message and wait for send to complete
Karsten Graul [Mon, 4 May 2020 12:18:42 +0000 (14:18 +0200)]
net/smc: send DELETE_LINK, ALL message and wait for send to complete

Add smc_llc_send_message_wait() which uses smc_wr_tx_send_wait() to send
an LLC message and waits for the message send to complete.
smc_llc_send_link_delete_all() calls the new function to send an
DELETE_LINK,ALL LLC message. The RFC states that the sender of this type
of message needs to wait for the completion event of the message
transmission and can terminate the link afterwards.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: wait for departure of an IB message
Karsten Graul [Mon, 4 May 2020 12:18:41 +0000 (14:18 +0200)]
net/smc: wait for departure of an IB message

Introduce smc_wr_tx_send_wait() to send an IB message and wait for the
tx completion event of the message. This makes sure that the message is
no longer in-flight when the function returns.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: handle incoming CDC validation message
Karsten Graul [Mon, 4 May 2020 12:18:40 +0000 (14:18 +0200)]
net/smc: handle incoming CDC validation message

Call smc_cdc_msg_validate() when a CDC message with the failover
validation bit enabled was received. Validate that the sequence number
sent with the message is one we already have received. If not, messages
were lost and the connection is terminated using a new abort_work.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: send failover validation message
Karsten Graul [Mon, 4 May 2020 12:18:39 +0000 (14:18 +0200)]
net/smc: send failover validation message

When a connection is switched to a new link then a link validation
message must be sent to the peer over the new link, containing the
sequence number of the last CDC message that was sent over the old link.
The peer will validate if this sequence number is the same or lower then
the number he received, and abort the connection if messages were lost.
Add smcr_cdc_msg_send_validation() to send the message validation
message and call it when a connection was switched in
smc_switch_cursor().

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: switch connections to alternate link
Karsten Graul [Mon, 4 May 2020 12:18:38 +0000 (14:18 +0200)]
net/smc: switch connections to alternate link

Add smc_switch_conns() to switch all connections from a link that is
going down. Find an other link to switch the connections to, and
switch each connection to the new link. smc_switch_cursor() updates the
cursors of a connection to the state of the last successfully sent CDC
message. When there is no link to switch to, terminate the link group.
Call smc_switch_conns() when a link is going down.
And with the possibility that links of connections can switch adapt CDC
and TX functions to detect and handle link switches.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: save state of last sent CDC message
Karsten Graul [Mon, 4 May 2020 12:18:37 +0000 (14:18 +0200)]
net/smc: save state of last sent CDC message

When a link goes down and all connections of this link need to be
switched to an other link then the producer cursor and the sequence of
the last successfully sent CDC message must be known. Add the two fields
to the SMC connection and update it in the tx completion handler.
And to allow matching of sequences in error cases reset the seqno to the
old value in smc_cdc_msg_send() when the actual send failed.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'bnxt_en-Updates-for-net-next'
David S. Miller [Mon, 4 May 2020 17:44:11 +0000 (10:44 -0700)]
Merge branch 'bnxt_en-Updates-for-net-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next.

This patchset includes these main changes:

1. Firmware spec. update.
2. Context memory sizing improvements for the hardware TQM block.
3. ethtool chip reset improvements and fixes for correctness.
4. Improve L2 doorbell mapping by mapping only up to the size specified
by firmware.  This allows the RoCE driver to map the remaining doorbell
space for its purpose, such as write-combining.
5. Improve ethtool -S channel statistics by showing only relevant ring
counters for non-combined channels.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: show only relevant ethtool stats for a TX or RX ring
Rajesh Ravi [Mon, 4 May 2020 08:50:41 +0000 (04:50 -0400)]
bnxt_en: show only relevant ethtool stats for a TX or RX ring

Currently, ethtool -S shows all TX/RX ring counters whether the
channel is combined, RX, or TX.  The unused counters will always be
zero.  Improve it by showing only the relevant counters if the channel
is RX or TX.  If the channel is combined, the counters will be shown
exactly the same as before.

[ MChan: Lots of cleanups and simplifications on Rajesh's original
code]

Signed-off-by: Rajesh Ravi <rajesh.ravi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Split HW ring statistics strings into RX and TX parts.
Michael Chan [Mon, 4 May 2020 08:50:40 +0000 (04:50 -0400)]
bnxt_en: Split HW ring statistics strings into RX and TX parts.

This will allow the RX and TX ring statistics to be separated if needed.
In the next patch, we'll be able to only display RX or TX statistcis if
the channel is RX only or TX only.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Refactor the software ring counters.
Michael Chan [Mon, 4 May 2020 08:50:39 +0000 (04:50 -0400)]
bnxt_en: Refactor the software ring counters.

We currently have 3 software ring counters, rx_l4_csum_errors,
rx_buf_errors, and missed_irqs.  The 1st two are RX counters and the
last one is a common counter.  Organize them into 2 structures
bnxt_rx_sw_stats and bnxt_cmn_sw_stats.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Add doorbell information to bnxt_en_dev struct.
Michael Chan [Mon, 4 May 2020 08:50:38 +0000 (04:50 -0400)]
bnxt_en: Add doorbell information to bnxt_en_dev struct.

The purpose of this is to inform the RDMA driver the size of the doorbell
BAR that the L2 driver has mapped and the portion that is mapped
uncacheable.  The unchaeable portion is shared with the RoCE driver.
Any remaining unmapped doorbell BAR can be used by the RDMA driver for
its own purpose.  Currently, the entire L2 portion is mapped uncacheable.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Add support for L2 doorbell size.
Michael Chan [Mon, 4 May 2020 08:50:37 +0000 (04:50 -0400)]
bnxt_en: Add support for L2 doorbell size.

Read the L2 doorbell size from the firmware and only map the portion
of the doorbell BAR for L2 use.  This will leave the remaining doorbell
BAR available for the RoCE driver to use.  The RoCE driver can map
the remaining portion as write-combining to support the push feature.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Set the db_offset on 57500 chips for the RDMA MSIX entries.
Michael Chan [Mon, 4 May 2020 08:50:36 +0000 (04:50 -0400)]
bnxt_en: Set the db_offset on 57500 chips for the RDMA MSIX entries.

The driver provides completion ring or NQ doorbell offset for each
MSIX entry requested by the RDMA driver.  The NQ offset on 57500
chips is different than legacy chips.  Set it correctly based on
chip type for correctness.  The RDMA driver is ignoring this field
for the 57500 chips so it is not causing any problem.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Define the doorbell offsets on 57500 chips.
Michael Chan [Mon, 4 May 2020 08:50:35 +0000 (04:50 -0400)]
bnxt_en: Define the doorbell offsets on 57500 chips.

Define the 57500 chip doorbell offsets instead of using the magic
values in the C file.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Improve kernel log messages related to ethtool reset.
Edwin Peer [Mon, 4 May 2020 08:50:34 +0000 (04:50 -0400)]
bnxt_en: Improve kernel log messages related to ethtool reset.

Kernel log messages for failed AP reset commands should be suppressed.
These are expected to fail on devices that do not have an AP.  Add
missing driver reload message after AP reset and log it in a common
way without duplication.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: fix ethtool_reset_flags ABI violations
Edwin Peer [Mon, 4 May 2020 08:50:33 +0000 (04:50 -0400)]
bnxt_en: fix ethtool_reset_flags ABI violations

The ethtool ABI specifies that the reset operation should only clear
the flags that were actually reset. Setting the flags to zero after
a chip reset violates this because it does not include resetting the
application processor complex. Similarly, components that are not yet
defined are also not necessarily being reset.

The fact that chip reset does not cover the AP also means that it is
inappropriate to treat these two components exclusively of one another.
The ABI provides a mechanism to report a failure to reset independent
components via the returned bitmask, so it is also wrong to fail hard
if one of a set of independent resets is not possible.

It is incorrect to rely on the passed by reference flags in bnxt_reset(),
which are being updated as components are reset. The initially requested
value should be used instead so that hard errors do not propagate if any
earlier components could have been reset successfully.

Note, AP and chip resets are global in nature. Dedicated resets are
thus not currently supported.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: refactor ethtool firmware reset types
Edwin Peer [Mon, 4 May 2020 08:50:32 +0000 (04:50 -0400)]
bnxt_en: refactor ethtool firmware reset types

The case statement in bnxt_firmware_reset() dangerously mixes types.
This patch separates the application processor and whole chip resets
from the rest such that the selection is performed on a pure type.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: prepare to refactor ethtool reset types
Edwin Peer [Mon, 4 May 2020 08:50:31 +0000 (04:50 -0400)]
bnxt_en: prepare to refactor ethtool reset types

Extract bnxt_hwrm_firmware_reset() for performing firmware reset
operations. This new helper function will be used in a subsequent
patch to separate unrelated reset types out of bnxt_firmware_reset().

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Do not include ETH_FCS_LEN in the max packet length sent to fw.
Vasundhara Volam [Mon, 4 May 2020 08:50:30 +0000 (04:50 -0400)]
bnxt_en: Do not include ETH_FCS_LEN in the max packet length sent to fw.

The firmware does not expect the CRC to be included in the length
passed from the driver.  The firmware always configures the chip
to strip out the CRC.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Improve TQM ring context memory sizing formulas.
Michael Chan [Mon, 4 May 2020 08:50:29 +0000 (04:50 -0400)]
bnxt_en: Improve TQM ring context memory sizing formulas.

The current formulas to calculate the TQM slow path and fast path ring
context memory sizes are not quite correct.  TQM slow path entry is
array index 0 of ctx->tqm_mem[].  The other array entries are for fast
path.  Fix these sizes according to latest firmware spec. for 57500 and
newer chips.

Fixes: b7bc480e5c60 ("bnxt_en: Initialize context memory to the value specified by firmware.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Allocate TQM ring context memory according to fw specification.
Michael Chan [Mon, 4 May 2020 08:50:28 +0000 (04:50 -0400)]
bnxt_en: Allocate TQM ring context memory according to fw specification.

Newer firmware spec. will specify the number of TQM rings to allocate
context memory for.  Use the firmware specified value and fall back
to the old value derived from bp->max_q if it is not available.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agobnxt_en: Update firmware spec. to 1.10.1.33.
Michael Chan [Mon, 4 May 2020 08:50:27 +0000 (04:50 -0400)]
bnxt_en: Update firmware spec. to 1.10.1.33.

Changes include additional statistics, ECN support, context memory
interface change for better TQM context memory sizing, firmware
health status definitions, etc.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-smc-add-and-delete-link-processing'
David S. Miller [Sun, 3 May 2020 23:08:29 +0000 (16:08 -0700)]
Merge branch 'net-smc-add-and-delete-link-processing'

Karsten Graul says:

====================
net/smc: add and delete link processing

These patches add the 'add link' and 'delete link' processing as
SMC server and client. This processing allows to establish and
remove links of a link group dynamically.

v2: Fix mess up with unused static functions. Merge patch 8 into patch 4.
    Postpone patch 13 to next series.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: enqueue local LLC messages
Karsten Graul [Sun, 3 May 2020 12:38:50 +0000 (14:38 +0200)]
net/smc: enqueue local LLC messages

As SMC server, when a second link was deleted, trigger the setup of an
asymmetric link. Do this by enqueueing a local ADD_LINK message which
is processed by the LLC layer as if it were received from peer. Do the
same when a new IB port became active and a new link could be created.
smc_llc_srv_add_link_local() enqueues a local ADD_LINK message.
And smc_llc_srv_delete_link_local() is used the same way to enqueue a
local DELETE_LINK message. This is used when an IB port is no longer
active.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: delete link processing as SMC server
Karsten Graul [Sun, 3 May 2020 12:38:49 +0000 (14:38 +0200)]
net/smc: delete link processing as SMC server

Add smc_llc_process_srv_delete_link() to process a DELETE_LINK request
as SMC server. When the request is to delete ALL links then terminate
the whole link group. If not, find the link to delete by its link_id,
send the DELETE_LINK request LLC message and wait for the response.
No matter if a response was received, clear the deleted link and update
the link group state.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: delete link processing as SMC client
Karsten Graul [Sun, 3 May 2020 12:38:48 +0000 (14:38 +0200)]
net/smc: delete link processing as SMC client

Add smc_llc_process_cli_delete_link() to process a DELETE_LINK request
as SMC client. When the request is to delete ALL links then terminate
the whole link group. If not, find the link to delete by its link_id,
send the DELETE_LINK response LLC message and then clear the deleted
link. Finally determine and update the link group state.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: llc_del_link_work and use the LLC flow for delete link
Karsten Graul [Sun, 3 May 2020 12:38:47 +0000 (14:38 +0200)]
net/smc: llc_del_link_work and use the LLC flow for delete link

Introduce a work that is scheduled when a new DELETE_LINK LLC request is
received. The work will call either the SMC client or SMC server
DELETE_LINK processing.
And use the LLC flow framework to process incoming DELETE_LINK LLC
messages, scheduling the llc_del_link_work for those events.
With these changes smc_lgr_forget() is only called by one function and
can be migrated into smc_lgr_cleanup_early().

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: delete an asymmetric link as SMC server
Karsten Graul [Sun, 3 May 2020 12:38:46 +0000 (14:38 +0200)]
net/smc: delete an asymmetric link as SMC server

When a link group moved from asymmetric to symmetric state then the
dangling asymmetric link can be deleted. Add smc_llc_find_asym_link() to
find the respective link and add smc_llc_delete_asym_link() to delete
it.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: final part of add link processing as SMC server
Karsten Graul [Sun, 3 May 2020 12:38:45 +0000 (14:38 +0200)]
net/smc: final part of add link processing as SMC server

This patch finalizes the ADD_LINK processing of new links. Send the
CONFIRM_LINK request to the peer, receive the response and set link
state to ACTIVE.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: rkey processing for a new link as SMC server
Karsten Graul [Sun, 3 May 2020 12:38:44 +0000 (14:38 +0200)]
net/smc: rkey processing for a new link as SMC server

Part of SMC server new link establishment is the exchange of rkeys for
used buffers.
Loop over all used RMB buffers and send ADD_LINK_CONTINUE LLC messages
to the peer.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: first part of add link processing as SMC server
Karsten Graul [Sun, 3 May 2020 12:38:43 +0000 (14:38 +0200)]
net/smc: first part of add link processing as SMC server

First set of functions to process an ADD_LINK LLC request as an SMC
server. Find an alternate IB device, determine the new link group type
and get the index for the new link. Then initialize the link and send
the ADD_LINK LLC message to the peer. Save the contents of the response,
ready the link, map all used buffers and register the buffers with the
IB device. If any error occurs, stop the processing and clear the link.
And call smc_llc_srv_add_link() in af_smc.c to start second link
establishment after the initial link of a link group was created.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: final part of add link processing as SMC client
Karsten Graul [Sun, 3 May 2020 12:38:42 +0000 (14:38 +0200)]
net/smc: final part of add link processing as SMC client

This patch finalizes the ADD_LINK processing of new links. Receive the
CONFIRM_LINK request from peer, complete the link initialization,
register all used buffers with the IB device and finally send the
CONFIRM_LINK response, which completes the ADD_LINK processing.
And activate smc_llc_cli_add_link() in af_smc.c.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: rkey processing for a new link as SMC client
Karsten Graul [Sun, 3 May 2020 12:38:41 +0000 (14:38 +0200)]
net/smc: rkey processing for a new link as SMC client

Part of the SMC client new link establishment process is the exchange of
rkeys for all used buffers.
Add new LLC message type ADD_LINK_CONTINUE which is used to exchange
rkeys of all current RMB buffers. Add functions to iterate over all
used RMB buffers of the link group, and implement the ADD_LINK_CONTINUE
processing.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: first part of add link processing as SMC client
Karsten Graul [Sun, 3 May 2020 12:38:40 +0000 (14:38 +0200)]
net/smc: first part of add link processing as SMC client

First set of functions to process an ADD_LINK LLC request as an SMC
client. Find an alternate IB device, determine the new link group type
and get the index for the new link. Then ready the link, map the buffers
and send an ADD_LINK LLC response. If any error occurs, send a reject
LLC message and terminate the processing.
Add smc_llc_alloc_alt_link() to find a free link index for a new link,
depending on the new link group type.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'Enhance-current-features-in-ena-driver'
David S. Miller [Sun, 3 May 2020 22:59:30 +0000 (15:59 -0700)]
Merge branch 'Enhance-current-features-in-ena-driver'

Sameeh Jubran says:

====================
Enhance current features in ena driver

Difference from v2:
* dropped patch "net: ena: move llq configuration from ena_probe to ena_device_init()"
* reworked patch ""net: ena: implement ena_com_get_admin_polling_mode() to drop the prototype

Difference from v1:
* reodered paches #01 and #02.
* dropped adding Rx/Tx drops to ethtool in patch #08

V1:
This patchset introduces the following:
* minor changes to RSS feature
* add total rx and tx drop counter
* add unmask_interrupt counter for ethtool statistics
* add missing implementation for ena_com_get_admin_polling_mode()
* some minor code clean-up and cosmetics
* use SHUTDOWN as reset reason when closing interface
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: extract code to ena_indirection_table_set()
Arthur Kiyanovski [Sun, 3 May 2020 09:52:21 +0000 (09:52 +0000)]
net: ena: cosmetic: extract code to ena_indirection_table_set()

Extract code to ena_indirection_table_set() to make
the code cleaner.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: cosmetic: remove unnecessary spaces and tabs in ena_com.h macros
Sameeh Jubran [Sun, 3 May 2020 09:52:20 +0000 (09:52 +0000)]
net: ena: cosmetic: remove unnecessary spaces and tabs in ena_com.h macros

The macros in ena_com.h have inconsistent spaces between
the macro name and it's value.

This commit sets all the macros to have a single space between
the name and value.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: use SHUTDOWN as reset reason when closing interface
Sameeh Jubran [Sun, 3 May 2020 09:52:19 +0000 (09:52 +0000)]
net: ena: use SHUTDOWN as reset reason when closing interface

The 'ENA_REGS_RESET_SHUTDOWN' enum indicates a normal driver
shutdown / removal procedure.

Also, a comment is added to one of the reset reason assignments for
code clarity.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: drop superfluous prototype
Arthur Kiyanovski [Sun, 3 May 2020 09:52:18 +0000 (09:52 +0000)]
net: ena: drop superfluous prototype

Before this commit there was a function prototype named
ena_com_get_ena_admin_polling_mode() that was never implemented.

This patch simply deletes it.

Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: add support for reporting of packet drops
Sameeh Jubran [Sun, 3 May 2020 09:52:17 +0000 (09:52 +0000)]
net: ena: add support for reporting of packet drops

1. Add support for getting tx drops from the device and saving them
in the driver.
2. Report tx via netdev stats.

Signed-off-by: Igor Chauskin <igorch@amazon.com>
Signed-off-by: Guy Tzalik <gtzalik@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: add unmask interrupts statistics to ethtool
Sameeh Jubran [Sun, 3 May 2020 09:52:16 +0000 (09:52 +0000)]
net: ena: add unmask interrupts statistics to ethtool

Add unmask interrupts statistics to ethtool.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: remove code that does nothing
Sameeh Jubran [Sun, 3 May 2020 09:52:15 +0000 (09:52 +0000)]
net: ena: remove code that does nothing

Both key and func parameters are pointers on the stack.
Setting them to NULL does nothing.
The original intent was to leave the key and func unset in this case,
but for this to happen nothing needs to be done as the calling
function ethtool_get_rxfh() already clears key and func.

This commit removes the above described useless code.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: changes to RSS hash key allocation
Sameeh Jubran [Sun, 3 May 2020 09:52:14 +0000 (09:52 +0000)]
net: ena: changes to RSS hash key allocation

This commit contains 2 cosmetic changes:

1. Use ena_com_check_supported_feature_id() in
   ena_com_hash_key_fill_default_key() instead of rewriting
   its implementation. This also saves us a superfluous admin
   command by using the cached value.

2. Change if conditions in ena_com_rss_init() to be clearer.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: change default RSS hash function to Toeplitz
Arthur Kiyanovski [Sun, 3 May 2020 09:52:13 +0000 (09:52 +0000)]
net: ena: change default RSS hash function to Toeplitz

Currently in the driver we are setting the hash function to be CRC32.
Starting with this commit we want to change the default behaviour so that
we set the hash function to be Toeplitz instead.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: allow setting the hash function without changing the key
Sameeh Jubran [Sun, 3 May 2020 09:52:12 +0000 (09:52 +0000)]
net: ena: allow setting the hash function without changing the key

Current code does not allow setting the hash function without
changing the key. This commit enables it.

To achieve this we separate ena_com_get_hash_function() to 2 functions:
ena_com_get_hash_function() - which gets only the hash function, and
ena_com_get_hash_key() - which gets only the hash key.

Also return 0 instead of rc at the end of ena_get_rxfh() since all
previous operations succeeded.

Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: fix error returning in ena_com_get_hash_function()
Arthur Kiyanovski [Sun, 3 May 2020 09:52:11 +0000 (09:52 +0000)]
net: ena: fix error returning in ena_com_get_hash_function()

In case the "func" parameter is NULL we now return "-EINVAL".
This shouldn't happen in general, but when it does happen, this is the
proper way to handle it.

We also check func for NULL in the beginning of the function, as there
is no reason to do all the work and realize in the end of the function
it was useless.

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ena: avoid unnecessary admin command when RSS function set fails
Arthur Kiyanovski [Sun, 3 May 2020 09:52:10 +0000 (09:52 +0000)]
net: ena: avoid unnecessary admin command when RSS function set fails

Currently when ena_set_hash_function() fails the hash function is
restored to the previous value by calling an admin command to get
the hash function from the device.

In this commit we avoid the admin command, by saving the previous
hash function before calling ena_set_hash_function() and using this
previous value to restore the hash function in case of failure of
ena_set_hash_function().

Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'sch_fq-optimizations'
David S. Miller [Sun, 3 May 2020 22:50:53 +0000 (15:50 -0700)]
Merge branch 'sch_fq-optimizations'

Eric Dumazet says:

====================
net_sched: sch_fq: round of optimizations

This series is focused on better layout of struct fq_flow to
reduce number of cache line misses in fq_enqueue() and dequeue operations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet_sched: sch_fq: perform a prefetch() earlier
Eric Dumazet [Sun, 3 May 2020 02:54:22 +0000 (19:54 -0700)]
net_sched: sch_fq: perform a prefetch() earlier

The prefetch() done in fq_dequeue() can be done a bit earlier
after the refactoring of the code done in the prior patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet_sched: sch_fq: do not call fq_peek() twice per packet
Eric Dumazet [Sun, 3 May 2020 02:54:21 +0000 (19:54 -0700)]
net_sched: sch_fq: do not call fq_peek() twice per packet

This refactors the code to not call fq_peek() from fq_dequeue_head()
since the caller can provide the skb.

Also rename fq_dequeue_head() to fq_dequeue_skb() because 'head' is
a bit vague, given the skb could come from t_root rb-tree.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet_sched: sch_fq: use bulk freeing in fq_gc()
Eric Dumazet [Sun, 3 May 2020 02:54:20 +0000 (19:54 -0700)]
net_sched: sch_fq: use bulk freeing in fq_gc()

fq_gc() already builds a small array of pointers, so using
kmem_cache_free_bulk() needs very little change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet_sched: sch_fq: change fq_flow size/layout
Eric Dumazet [Sun, 3 May 2020 02:54:19 +0000 (19:54 -0700)]
net_sched: sch_fq: change fq_flow size/layout

sizeof(struct fq_flow) is 112 bytes on 64bit arches.

This means that half of them use two cache lines, but 50% use
three cache lines.

This patch adds cache line alignment, and makes sure that only
the first cache line is touched by fq_enqueue(), which is more
expensive that fq_dequeue() in general.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet_sched: sch_fq: avoid touching f->next from fq_gc()
Eric Dumazet [Sun, 3 May 2020 02:54:18 +0000 (19:54 -0700)]
net_sched: sch_fq: avoid touching f->next from fq_gc()

A significant amount of cpu cycles is spent in fq_gc()

When fq_gc() does its lookup in the rb-tree, it needs the
following fields from struct fq_flow :

f->sk       (lookup key in the rb-tree)
f->fq_node  (anchor in the rb-tree)
f->next     (used to determine if the flow is detached)
f->age      (used to determine if the flow is candidate for gc)

This unfortunately spans two cache lines (assuming 64 bytes cache lines)

We can avoid using f->next, if we use the low order bit of f->{age|tail}

This low order bit is 0, if f->tail points to an sk_buff.
We set the low order bit to 1, if the union contains a jiffies value.

Combined with the following patch, this makes sure we only need
to bring into cpu caches one cache line per flow.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoinet_diag: bc: read cgroup id only for full sockets
Dmitry Yakunin [Sat, 2 May 2020 15:34:42 +0000 (18:34 +0300)]
inet_diag: bc: read cgroup id only for full sockets

Fix bug introduced by commit 13a1aaf238ae ("inet_diag: add support for
cgroup filter").

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Reported-by: syzbot+ee80f840d9bf6893223b@syzkaller.appspotmail.com
Reported-by: syzbot+13bef047dbfffa5cd1af@syzkaller.appspotmail.com
Fixes: 13a1aaf238ae ("inet_diag: add support for cgroup filter")
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet: ethernet: fec: Replace interrupt driven MDIO with polled IO
Andrew Lunn [Sat, 2 May 2020 15:25:04 +0000 (17:25 +0200)]
net: ethernet: fec: Replace interrupt driven MDIO with polled IO

Measurements of the MDIO bus have shown that driving the MDIO bus
using interrupts is slow. Back to back MDIO transactions take about
90us, with 25us spent performing the transaction, and the remainder of
the time the bus is idle.

Replacing the completion interrupt with polled IO results in back to
back transactions of 40us. The polling loop waiting for the hardware
to complete the transaction takes around 28us. Which suggests
interrupt handling has an overhead of 50us, and polled IO nearly
halves this overhead, and doubles the MDIO performance.

Care has to be taken when setting the MII_SPEED register, or it can
trigger an MII event> That then upsets the polling, due to an
unexpected pending event.

Suggested-by: Chris Heally <cphealy@gmail.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agosmc: Remove unused function.
David S. Miller [Sat, 2 May 2020 23:41:06 +0000 (16:41 -0700)]
smc: Remove unused function.

net/smc/smc_llc.c:544:12: warning: ‘smc_llc_alloc_alt_link’ defined but not used [-Wunused-function]
 static int smc_llc_alloc_alt_link(struct smc_link_group *lgr,
            ^~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'ptp-Add-adjust-phase-to-support-phase-offset'
David S. Miller [Sat, 2 May 2020 23:31:45 +0000 (16:31 -0700)]
Merge branch 'ptp-Add-adjust-phase-to-support-phase-offset'

Vincent Cheng says:

====================
ptp: Add adjust phase to support phase offset.

This series adds adjust phase to the PTP Hardware Clock device interface.

Some PTP hardware clocks have a write phase mode that has
a built-in hardware filtering capability.  The write phase mode
utilizes a phase offset control word instead of a frequency offset
control word.  Add adjust phase function to take advantage of this
capability.

Changes since v1:
- As suggested by Richard Cochran:
  1. ops->adjphase is new so need to check for non-null function pointer.
  2. Kernel coding style uses lower_case_underscores.
  3. Use existing PTP clock API for delayed worker.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
Vincent Cheng [Sat, 2 May 2020 03:35:38 +0000 (23:35 -0400)]
ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.

Add idtcm_adjphase() to support PHC write phase mode.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoptp: Add adjust_phase to ptp_clock_caps capability.
Vincent Cheng [Sat, 2 May 2020 03:35:37 +0000 (23:35 -0400)]
ptp: Add adjust_phase to ptp_clock_caps capability.

Add adjust_phase to ptp_clock_caps capability to allow
user to query if a PHC driver supports adjust phase with
ioctl PTP_CLOCK_GETCAPS command.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoptp: Add adjphase function to support phase offset control.
Vincent Cheng [Sat, 2 May 2020 03:35:36 +0000 (23:35 -0400)]
ptp: Add adjphase function to support phase offset control.

Adds adjust phase function to take advantage of a PHC
clock's hardware filtering capability that uses phase offset
control word instead of frequency offset control word.

Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
David S. Miller [Sat, 2 May 2020 00:02:27 +0000 (17:02 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Alexei Starovoitov says:

====================
pull-request: bpf-next 2020-05-01 (v2)

The following pull-request contains BPF updates for your *net-next* tree.

We've added 61 non-merge commits during the last 6 day(s) which contain
a total of 153 files changed, 6739 insertions(+), 3367 deletions(-).

The main changes are:

1) pulled work.sysctl from vfs tree with sysctl bpf changes.

2) bpf_link observability, from Andrii.

3) BTF-defined map in map, from Andrii.

4) asan fixes for selftests, from Andrii.

5) Allow bpf_map_lookup_elem for SOCKMAP and SOCKHASH, from Jakub.

6) production cloudflare classifier as a selftes, from Lorenz.

7) bpf_ktime_get_*_ns() helper improvements, from Maciej.

8) unprivileged bpftool feature probe, from Quentin.

9) BPF_ENABLE_STATS command, from Song.

10) enable bpf_[gs]etsockopt() helpers for sock_ops progs, from Stanislav.

11) enable a bunch of common helpers for cg-device, sysctl, sockopt progs,
 from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge branch 'net-smc-extent-buffer-mapping-and-port-handling'
David S. Miller [Fri, 1 May 2020 23:20:05 +0000 (16:20 -0700)]
Merge branch 'net-smc-extent-buffer-mapping-and-port-handling'

Karsten Graul says:

====================
net/smc: extent buffer mapping and port handling

Add functionality to map/unmap and register/unregister memory buffers for
specific SMC-R links and for the whole link group. Prepare LLC layer messages
for the support of multiple links and extent the processing of adapter events.
And add further small preparations needed for the SMC-R failover support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoselftests/bpf: Use reno instead of dctcp
Stanislav Fomichev [Fri, 1 May 2020 22:43:20 +0000 (15:43 -0700)]
selftests/bpf: Use reno instead of dctcp

Andrey pointed out that we can use reno instead of dctcp for CC
tests and drop CONFIG_TCP_CONG_DCTCP=y requirement.

Fixes: ebf63dc69a9c ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr")
Suggested-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
4 years agonet/smc: llc_add_link_work to handle ADD_LINK LLC requests
Karsten Graul [Fri, 1 May 2020 10:48:13 +0000 (12:48 +0200)]
net/smc: llc_add_link_work to handle ADD_LINK LLC requests

Introduce a work that is scheduled when a new ADD_LINK LLC request is
received. The work will call either the SMC client or SMC server
ADD_LINK processing.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: allocate index for a new link
Karsten Graul [Fri, 1 May 2020 10:48:12 +0000 (12:48 +0200)]
net/smc: allocate index for a new link

Add smc_llc_alloc_alt_link() to find a free link index for a new link,
depending on the new link group type. And update constants for the
maximum number of links to 3 (2 symmetric and 1 dangling asymmetric link).
These maximum numbers are the same as used by other implementations of the
SMC-R protocol.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agonet/smc: introduce smc_pnet_find_alt_roce()
Karsten Graul [Fri, 1 May 2020 10:48:11 +0000 (12:48 +0200)]
net/smc: introduce smc_pnet_find_alt_roce()

Introduce a new function in smc_pnet.c that searches for an alternate
IB device, using an existing link group and a primary IB device. The
alternate IB device needs to be active and must have the same PNETID
as the link group.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>