====================
mptcp: MPTCP support for TCP_FASTOPEN_CONNECT
RFC 8684 appendix B describes how to use TCP Fast Open with MPTCP. This
series allows TFO use with MPTCP using the TCP_FASTOPEN_CONNECT socket
option. The scope here is limited to the initiator of the connection -
support for MSG_FASTOPEN and the listener side of the connection will be
in a separate series. The preexisting TCP fastopen code does most of the
work, so these changes mostly involve plumbing MPTCP through to those
TCP functions.
Patch 1 changes the MPTCP socket option code to pass the
TCP_FASTOPEN_CONNECT option through to the initial unconnected subflow.
Patch 2 exports the existing tcp_sendmsg_fastopen() function from tcp.c
Patch 3 adds the call to tcp_sendmsg_fastopen() from the MPTCP send
function.
Patch 4 modifies mptcp_poll() to handle the deferred TFO connection.
====================
Benjamin Hesmans [Mon, 26 Sep 2022 23:27:39 +0000 (16:27 -0700)]
mptcp: poll allow write call before actual connect
If fastopen is used, poll must allow a first write that will trigger
the SYN+data
Similar to what is done in tcp_poll().
Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Benjamin Hesmans [Mon, 26 Sep 2022 23:27:36 +0000 (16:27 -0700)]
mptcp: add TCP_FASTOPEN_CONNECT socket option
Set the option for the first subflow only. For the other subflows TFO
can't be used because a mapping would be needed to cover the data in the
SYN.
Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Benjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netns: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.
This helper allows for flexible-array members in unions.
Jakub Kicinski [Thu, 29 Sep 2022 01:51:27 +0000 (18:51 -0700)]
Merge branch 'shrink-struct-ubuf_info'
Pavel Begunkov says:
====================
shrink struct ubuf_info
struct ubuf_info is large but not all fields are needed for all
cases. We have limited space in io_uring for it and large ubuf_info
prevents some struct embedding, even though we use only a subset
of the fields. It's also not very clean trying to use this typeless
extra space.
Shrink struct ubuf_info to only necessary fields used in generic paths,
namely ->callback, ->refcnt and ->flags, which take only 16 bytes. And
make MSG_ZEROCOPY and some other users to embed it into a larger struct
ubuf_info_msgzc mimicking the former ubuf_info.
Note, xen/vhost may also have some cleaning on top by creating
new structs containing ubuf_info but with proper types.
====================
Pavel Begunkov [Fri, 23 Sep 2022 16:39:04 +0000 (17:39 +0100)]
net: shrink struct ubuf_info
We can benefit from a smaller struct ubuf_info, so leave only mandatory
fields and let users to decide how they want to extend it. Convert
MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields.
This reduces the size from 48 bytes to just 16.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pavel Begunkov [Fri, 23 Sep 2022 16:39:01 +0000 (17:39 +0100)]
net: introduce struct ubuf_info_msgzc
We're going to split struct ubuf_info and leave there only
mandatory fields. Users are free to extend it. Add struct
ubuf_info_msgzc, which will be an extended version for MSG_ZEROCOPY and
some other users. It duplicates of struct ubuf_info for now and will be
removed in a couple of patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Wed, 28 Sep 2022 16:46:35 +0000 (09:46 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Florian Westphal says:
====================
netfilter fix for net-next
This is a late bug fix for the *net-next* tree to make nftables
"fib" expression play nice with VRF devices.
This was broken since day 1 (v4.10) so I don't see a compelling reason
to push this via net at the last minute.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nft_fib: Fix for rpath check with VRF devices
====================
Phil Sutter [Wed, 21 Sep 2022 11:07:31 +0000 (13:07 +0200)]
netfilter: nft_fib: Fix for rpath check with VRF devices
Analogous to commit 07980f538cd5d ("netfilter: Fix rpfilter
dropping vrf packets by mistake") but for nftables fib expression:
Add special treatment of VRF devices so that typical reverse path
filtering via 'fib saddr . iif oif' expression works as expected.
David S. Miller [Wed, 28 Sep 2022 08:43:22 +0000 (09:43 +0100)]
Merge branch 'sfc-tc-offload'
Edward Cree says:
====================
sfc: bare bones TC offload
This series begins the work of supporting TC flower offload on EF100 NICs.
This is the absolute minimum viable TC implementation to get traffic to
VFs and allow them to be tested; it supports no match fields besides
ingress port, no actions besides mirred and drop, and no stats.
More matches, actions, and counters will be added in subsequent patches.
Changed in v2:
- Add missing 'static' on declarations (kernel test robot, sparse)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 26 Sep 2022 18:57:36 +0000 (19:57 +0100)]
sfc: bare bones TC offload on EF100
This is the absolute minimum viable TC implementation to get traffic to
VFs and allow them to be tested; it supports no match fields besides
ingress port, no actions besides mirred and drop, and no stats.
Example usage:
tc filter add dev $PF parent ffff: flower skip_sw \
action mirred egress mirror dev $VFREP
tc filter add dev $VFREP parent ffff: flower skip_sw \
action mirred egress redirect dev $PF
gives a VF unfiltered access to the network out the physical port ($PF
acts here as a physical port representor).
More matches, actions, and counters will be added in subsequent patches.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 26 Sep 2022 18:57:35 +0000 (19:57 +0100)]
sfc: interrogate MAE capabilities at probe time
Different versions of EF100 firmware and FPGA bitstreams support different
matching capabilities in the Match-Action Engine. Probe for these at
start of day; subsequent patches will validate TC offload requests
against the reported capabilities.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 26 Sep 2022 18:57:34 +0000 (19:57 +0100)]
sfc: add a hashtable for offloaded TC rules
Nothing inserts into this table yet, but we have code to remove rules
on FLOW_CLS_DESTROY or at driver teardown time, in both cases also
attempting to remove the corresponding hardware rules.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 26 Sep 2022 18:57:33 +0000 (19:57 +0100)]
sfc: optional logging of TC offload errors
TC offload support will involve complex limitations on what matches and
actions a rule can do, in some cases potentially depending on rules
already offloaded. So add an ethtool private flag "log-tc-errors" which
controls reporting the reasons for un-offloadable TC rules at NETIF_INFO.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 26 Sep 2022 18:57:32 +0000 (19:57 +0100)]
sfc: bind indirect blocks for TC offload on EF100
Bind indirect blocks for recognised tunnel netdevices.
Currently these connect to a stub efx_tc_flower() that only returns
-EOPNOTSUPP; subsequent patches will implement flower offloads to the
Match-Action Engine.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Mon, 26 Sep 2022 18:57:31 +0000 (19:57 +0100)]
sfc: bind blocks for TC offload on EF100
Bind direct blocks for the MAE-admin PF and each VF representor.
Currently these connect to a stub efx_tc_flower() that only returns
-EOPNOTSUPP; subsequent patches will implement flower offloads to the
Match-Action Engine.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: rmnet: Replace zero-length array with DECLARE_FLEX_ARRAY() helper
Zero-length arrays are deprecated and we are moving towards adopting
C99 flexible-array members, instead. So, replace zero-length arrays
declarations in anonymous union with the new DECLARE_FLEX_ARRAY()
helper macro.
This helper allows for flexible-array members in unions.
Lan966x switch supports credit based shaper in hardware according to
IEEE Std 802.1Q-2018 Section 8.6.8.2. Add support for cbs configuration
on egress port of lan966x switch.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The tbf qdisc allows to attach a shaper on traffic egress on a port or
on a queue. On port they are attached directly to the root and on queue
they are attached on one of the classes of the parent qdisc.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 28 Sep 2022 07:32:55 +0000 (08:32 +0100)]
Merge branch 'tc-testing-qdisc'
Zhengchao Shao says:
====================
net: add tc-testing qdisc test cases
For this patchset, test cases of the qdisc modules are added to the
tc-testing test suite.
Last, thanks to Victor for testing and suggestion.
After a test case is added locally, the test result is as follows:
./tdc.py -c atm
ok 1 7628 - Create ATM with default setting
ok 2 390a - Delete ATM with valid handle
ok 3 32a0 - Show ATM class
ok 4 6310 - Dump ATM stats
./tdc.py -c choke
ok 1 8937 - Create CHOKE with default setting
ok 2 48c0 - Create CHOKE with min packet setting
ok 3 38c1 - Create CHOKE with max packet setting
ok 4 234a - Create CHOKE with ecn setting
ok 5 4380 - Create CHOKE with burst setting
ok 6 48c7 - Delete CHOKE with valid handle
ok 7 4398 - Replace CHOKE with min setting
ok 8 0301 - Change CHOKE with limit setting
./tdc.py -c codel
ok 1 983a - Create CODEL with default setting
ok 2 38aa - Create CODEL with limit packet setting
ok 3 9178 - Create CODEL with target setting
ok 4 78d1 - Create CODEL with interval setting
ok 5 238a - Create CODEL with ecn setting
ok 6 939c - Create CODEL with ce_threshold setting
ok 7 8380 - Delete CODEL with valid handle
ok 8 289c - Replace CODEL with limit setting
ok 9 0648 - Change CODEL with limit setting
./tdc.py -c etf
ok 1 34ba - Create ETF with default setting
ok 2 438f - Create ETF with delta nanos setting
ok 3 9041 - Create ETF with deadline_mode setting
ok 4 9a0c - Create ETF with skip_sock_check setting
ok 5 2093 - Delete ETF with valid handle
./tdc.py -c fq
ok 1 983b - Create FQ with default setting
ok 2 38a1 - Create FQ with limit packet setting
ok 3 0a18 - Create FQ with flow_limit setting
ok 4 2390 - Create FQ with quantum setting
ok 5 845b - Create FQ with initial_quantum setting
ok 6 9398 - Create FQ with maxrate setting
ok 7 342c - Create FQ with nopacing setting
ok 8 6391 - Create FQ with refill_delay setting
ok 9 238b - Create FQ with low_rate_threshold setting
ok 10 7582 - Create FQ with orphan_mask setting
ok 11 4894 - Create FQ with timer_slack setting
ok 12 324c - Create FQ with ce_threshold setting
ok 13 424a - Create FQ with horizon time setting
ok 14 89e1 - Create FQ with horizon_cap setting
ok 15 32e1 - Delete FQ with valid handle
ok 16 49b0 - Replace FQ with limit setting
ok 17 9478 - Change FQ with limit setting
./tdc.py -c gred
ok 1 8942 - Create GRED with default setting
ok 2 5783 - Create GRED with grio setting
ok 3 8a09 - Create GRED with limit setting
ok 4 48cb - Create GRED with ecn setting
ok 5 763a - Change GRED setting
ok 6 8309 - Show GRED class
./tdc.py -c hhf
ok 1 4812 - Create HHF with default setting
ok 2 8a92 - Create HHF with limit setting
ok 3 3491 - Create HHF with quantum setting
ok 4 ba04 - Create HHF with reset_timeout setting
ok 5 4238 - Create HHF with admit_bytes setting
ok 6 839f - Create HHF with evict_timeout setting
ok 7 a044 - Create HHF with non_hh_weight setting
ok 8 32f9 - Change HHF with limit setting
ok 9 385e - Show HHF class
./tdc.py -c pfifo_fast
ok 1 900c - Create pfifo_fast with default setting
ok 2 7470 - Dump pfifo_fast stats
ok 3 b974 - Replace pfifo_fast with different handle
ok 4 3240 - Delete pfifo_fast with valid handle
ok 5 4385 - Delete pfifo_fast with invalid handle
./tdc.py -c plug
ok 1 3289 - Create PLUG with default setting
ok 2 0917 - Create PLUG with block setting
ok 3 483b - Create PLUG with release setting
ok 4 4995 - Create PLUG with release_indefinite setting
ok 5 389c - Create PLUG with limit setting
ok 6 384a - Delete PLUG with valid handle
ok 7 439a - Replace PLUG with limit setting
ok 8 9831 - Change PLUG with limit setting
./tdc.py -c sfb
ok 1 3294 - Create SFB with default setting
ok 2 430a - Create SFB with rehash setting
ok 3 3410 - Create SFB with db setting
ok 4 49a0 - Create SFB with limit setting
ok 5 1241 - Create SFB with max setting
ok 6 3249 - Create SFB with target setting
ok 7 30a9 - Create SFB with increment setting
ok 8 239a - Create SFB with decrement setting
ok 9 9301 - Create SFB with penalty_rate setting
ok 10 2a01 - Create SFB with penalty_burst setting
ok 11 3209 - Change SFB with rehash setting
ok 12 5447 - Show SFB class
./tdc.py -c sfq
ok 1 7482 - Create SFQ with default setting
ok 2 c186 - Create SFQ with limit setting
ok 3 ae23 - Create SFQ with perturb setting
ok 4 a430 - Create SFQ with quantum setting
ok 5 4539 - Create SFQ with divisor setting
ok 6 b089 - Create SFQ with flows setting
ok 7 99a0 - Create SFQ with depth setting
ok 8 7389 - Create SFQ with headdrop setting
ok 9 6472 - Create SFQ with redflowlimit setting
ok 10 8929 - Show SFQ class
./tdc.py -c skbprio
ok 1 283e - Create skbprio with default setting
ok 2 c086 - Create skbprio with limit setting
ok 3 6733 - Change skbprio with limit setting
ok 4 2958 - Show skbprio class
./tdc.py -c taprio
ok 1 ba39 - Add taprio Qdisc to multi-queue device (8 queues)
ok 2 9462 - Add taprio Qdisc with multiple sched-entry
ok 3 8d92 - Add taprio Qdisc with txtime-delay
ok 4 d092 - Delete taprio Qdisc with valid handle
ok 5 8471 - Show taprio class
ok 6 0a85 - Add taprio Qdisc to single-queue device
./tdc.py -c tbf
ok 1 6430 - Create TBF with default setting
ok 2 0518 - Create TBF with mtu setting
ok 3 320a - Create TBF with peakrate setting
ok 4 239b - Create TBF with latency setting
ok 5 c975 - Create TBF with overhead setting
ok 6 948c - Create TBF with linklayer setting
ok 7 3549 - Replace TBF with mtu
ok 8 f948 - Change TBF with latency time
ok 9 2348 - Show TBF class
./tdc.py -c teql
ok 1 84a0 - Create TEQL with default setting
ok 2 7734 - Create TEQL with multiple device
ok 3 34a9 - Delete TEQL with valid handle
ok 4 6289 - Show TEQL stats
selftests/tc-testing: add selftests for teql qdisc
Test 84a0: Create TEQL with default setting
Test 7734: Create TEQL with multiple device
Test 34a9: Delete TEQL with valid handle
Test 6289: Show TEQL stats
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Test 6430: Create TBF with default setting
Test 0518: Create TBF with mtu setting
Test 320a: Create TBF with peakrate setting
Test 239b: Create TBF with latency setting
Test c975: Create TBF with overhead setting
Test 948c: Create TBF with linklayer setting
Test 3549: Replace TBF with mtu
Test f948: Change TBF with latency time
Test 2348: Show TBF class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for taprio qdisc
Test ba39: Add taprio Qdisc to multi-queue device (8 queues)
Test 9462: Add taprio Qdisc with multiple sched-entry
Test 8d92: Add taprio Qdisc with txtime-delay
Test d092: Delete taprio Qdisc with valid handle
Test 8471: Show taprio class
Test 0a85: Add taprio Qdisc to single-queue device
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for skbprio qdisc
Test 283e: Create skbprio with default setting
Test c086: Create skbprio with limit setting
Test 6733: Change skbprio with limit setting
Test 2958: Show skbprio class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Test 7482: Create SFQ with default setting
Test c186: Create SFQ with limit setting
Test ae23: Create SFQ with perturb setting
Test a430: Create SFQ with quantum setting
Test 4539: Create SFQ with divisor setting
Test b089: Create SFQ with flows setting
Test 99a0: Create SFQ with depth setting
Test 7389: Create SFQ with headdrop setting
Test 6472: Create SFQ with redflowlimit setting
Test 8929: Show SFQ class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Test 3294: Create SFB with default setting
Test 430a: Create SFB with rehash setting
Test 3410: Create SFB with db setting
Test 49a0: Create SFB with limit setting
Test 1241: Create SFB with max setting
Test 3249: Create SFB with target setting
Test 30a9: Create SFB with increment setting
Test 239a: Create SFB with decrement setting
Test 9301: Create SFB with penalty_rate setting
Test 2a01: Create SFB with penalty_burst setting
Test 3209: Change SFB with rehash setting
Test 5447: Show SFB class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for plug qdisc
Test 3289: Create PLUG with default setting
Test 0917: Create PLUG with block setting
Test 483b: Create PLUG with release setting
Test 4995: Create PLUG with release_indefinite setting
Test 389c: Create PLUG with limit setting
Test 384a: Delete PLUG with valid handle
Test 439a: Replace PLUG with limit setting
Test 9831: Change PLUG with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for pfifo_fast qdisc
Test 900c: Create pfifo_fast with default setting
Test 7470: Dump pfifo_fast stats
Test b974: Replace pfifo_fast with different handle
Test 3240: Delete pfifo_fast with valid handle
Test 4385: Delete pfifo_fast with invalid handle
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Test 4812: Create HHF with default setting
Test 8a92: Create HHF with limit setting
Test 3491: Create HHF with quantum setting
Test ba04: Create HHF with reset_timeout setting
Test 4238: Create HHF with admit_bytes setting
Test 839f: Create HHF with evict_timeout setting
Test a044: Create HHF with non_hh_weight setting
Test 32f9: Change HHF with limit setting
Test 385e: Show HHF class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for gred qdisc
Test 8942: Create GRED with default setting
Test 5783: Create GRED with grio setting
Test 8a09: Create GRED with limit setting
Test 48cb: Create GRED with ecn setting
Test 763a: Change GRED setting
Test 8309: Show GRED class
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Test 983b: Create FQ with default setting
Test 38a1: Create FQ with limit packet setting
Test 0a18: Create FQ with flow_limit setting
Test 2390: Create FQ with quantum setting
Test 845b: Create FQ with initial_quantum setting
Test 9398: Create FQ with maxrate setting
Test 342c: Create FQ with nopacing setting
Test 6391: Create FQ with refill_delay setting
Test 238b: Create FQ with low_rate_threshold setting
Test 7582: Create FQ with orphan_mask setting
Test 4894: Create FQ with timer_slack setting
Test 324c: Create FQ with ce_threshold setting
Test 424a: Create FQ with horizon time setting
Test 89e1: Create FQ with horizon_cap setting
Test 32e1: Delete FQ with valid handle
Test 49b0: Replace FQ with limit setting
Test 9478: Change FQ with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Test 34ba: Create ETF with default setting
Test 438f: Create ETF with delta nanos setting
Test 9041: Create ETF with deadline_mode setting
Test 9a0c: Create ETF with skip_sock_check setting
Test 2093: Delete ETF with valid handle
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for codel qdisc
Test 983a: Create CODEL with default setting
Test 38aa: Create CODEL with limit packet setting
Test 9178: Create CODEL with target setting
Test 78d1: Create CODEL with interval setting
Test 238a: Create CODEL with ecn setting
Test 939c: Create CODEL with ce_threshold setting
Test 8380: Delete CODEL with valid handle
Test 289c: Replace CODEL with limit setting
Test 0648: Change CODEL with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests/tc-testing: add selftests for choke qdisc
Test 8937: Create CHOKE with default setting
Test 48c0: Create CHOKE with min packet setting
Test 38c1: Create CHOKE with max packet setting
Test 234a: Create CHOKE with ecn setting
Test 4380: Create CHOKE with burst setting
Test 48c7: Delete CHOKE with valid handle
Test 4398: Replace CHOKE with min setting
Test 0301: Change CHOKE with limit setting
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: core_acl_flex_actions: Split memcpy() of struct flow_action_cookie flexible array
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), split the memcpy() of the header and the payload
so no false positive run-time overflow warning will be generated.
This series is quite a bit bigger than what I normally like to send,
and I apologize for that. I would like it to get incorporated in
its entirety this week if possible, and splitting up the series
carries a small risk that wouldn't happen.
Each IPA register has a defined offset, and in most cases, a set
of masks that define the width and position of fields within the
register. Most registers currently use the same offset for all
versions of IPA. Usually fields within registers are also the same
across many versions. Offsets and fields like this are defined
using preprocessor constants.
When a register has a different offset for different versions of
IPA, an inline function is used to determine its offset. And in
places where a field differs between versions, an inline function is
used to determine how a value is encoded within the field, depending
on IPA version.
Starting with IPA version 5.0, the number of IPA endpoints supported
is greater than 32. As a consequence, *many* IPA register offsets
differ considerably from prior versions. This increase in endpoints
also requires a lot of field sizes and/or positions to change (such
as those that contain an endpoint ID).
Defining these things with constants is no longer simple, and rather
than fill the code with one-off functions to define offsets and
encode field values, this series puts in place a new way of defining
IPA registers and their fields. Note that this series creates this
new scheme, but does not add IPA v5.0+ support.
An enumerated type will now define a unique ID for each IPA register.
Each defined register will have a structure that contains its offset
and its name (a printable string). Each version of IPA will have an
array of these register structures, indexed by register ID.
Some "parameterized" registers are duplicated (this is not new).
For example, each endpoint has an INIT_HDR register, and the offset
of a given endpoint's INIT_HDR register is dependent on the endpoint
number (the parameter). In such cases, the register's "stride" is
defined as the distance between two of these registers.
If a register contains fields, each field will have a unique ID
that's used as an index into an array of field masks defined for the
register. The register structure also defines the number of entries
in this field array.
When a register is to be used in code, its register structure will
be fetched using function ipa_reg(). Other functions are then used
to determine the register's offset, or to encode a value into one of
the register's fields, and so on.
Each version of IPA defines the set of registers that are available,
including all fields for these registers. The array of defined
registers is set up at probe time based on the IPA version, and it
is associated with the main IPA structure.
====================
Alex Elder [Mon, 26 Sep 2022 22:09:31 +0000 (17:09 -0500)]
net: ipa: define remaining IPA register fields
Define the fields for the ENDP_INIT_DEAGGR, ENDP_INIT_RSRC_GRP,
ENDP_INIT_SEQ, ENDP_STATUS, and ENDP_FILTER_ROUTER_HSH_CFG, and
IPA_IRQ_UC IPA registers for all supported IPA versions.
Create enumerated types to identify fields for these IPA registers.
Use IPA_REG_FIELDS() and IPA_REG_STRIDE_FIELDS() to specify the
field mask values defined for these registers, for each supported
version of IPA.
Use ipa_reg_encode() and ipa_reg_bit() to build up the values to be
written to these registers, remove an inline function and all the
*_FMASK symbols that are now no longer used.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:30 +0000 (17:09 -0500)]
net: ipa: define more IPA endpoint register fields
Define the fields for the ENDP_INIT_MODE, ENDP_INIT_AGGR,
ENDP_INIT_HOL_BLOCK_EN, and ENDP_INIT_HOL_BLOCK_TIMER IPA
registers for all supported IPA versions.
Create enumerated types to identify fields for these IPA registers.
Use IPA_REG_STRIDE_FIELDS() to specify the field mask values defined
for these registers, for each supported version of IPA.
Change aggr_time_limit_encode() and hol_block_timer_encode() so they
take an ipa_reg pointer, and use those register's fields to compute
their encoded results. Have aggr_time_limit_encode() take an IPA
pointer rather than version, to match hol_block_timer_encode().
Use ipa_reg_encode(), ipa_reg_bit(), and ipa_reg_field_max() to
manipulate values to be written to these registers, remove the
definitions of the various inline functions and *_FMASK symbols that
are now no longer used.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:29 +0000 (17:09 -0500)]
net: ipa: define some IPA endpoint register fields
Define the fields for the ENDP_INIT_CTRL, ENDP_INIT_CFG, ENDP_INIT_NAT,
ENDP_INIT_HDR, and ENDP_INIT_HDR_EXT IPA registers for all supported
IPA versions.
Create enumerated types to identify fields for these IPA registers.
Use IPA_REG_STRIDE_FIELDS() to specify the field mask values defined
for these registers, for each supported version of IPA.
Move ipa_header_size_encoded() and ipa_metadata_offset_encoded() out
of "ipa_reg.h" and into "ipa_endpoint.c". Change them so they take
an additional ipa_reg structure argument, and use ipa_reg_encode()
to encode the parts of the header size and offset prior to writing
to the register. Change their names to be verbs rather than nouns.
Use ipa_reg_encode(), ipa_reg_bit, and ipa_reg_field_max() to
manipulate values to be written to these registers, remove the
definition of the no-longer-used *_FMASK symbols.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Define the fields for the {SRC,DST}_RSRC_GRP_{01,23,45,67}_RSRC_TYPE
IPA registers for all supported IPA versions.
Create enumerated types to identify fields for these IPA registers.
Use IPA_REG_STRIDE_FIELDS() to specify the field mask values defined
for these registers, for each supported version of IPA.
Use ipa_reg_encode() to build up the values to be written to these
registers.
Remove the definition of the no-longer-used *_FMASK symbols.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:27 +0000 (17:09 -0500)]
net: ipa: define even more IPA register fields
Define the fields for the FLAVOR_0, IDLE_INDICATION_CFG,
QTIME_TIMESTAMP_CFG, TIMERS_XO_CLK_DIV_CFG and TIMERS_PULSE_GRAN_CFG
IPA registers for all supported IPA versions.
Create enumerated types to identify fields for these IPA registers.
Use IPA_REG_FIELDS() to specify the field mask values defined for
these registers, for each supported version of IPA.
Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be
written to these registers. Use ipa_reg_decode() to extract field
values from the FLAVOR_0 register.
Remove the definition of the no-longer-used *_FMASK symbols.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:26 +0000 (17:09 -0500)]
net: ipa: define more IPA register fields
Define the fields for the LOCAL_PKT_PROC_CNTXT, COUNTER_CFG, and
IPA_TX_CFG IPA registers for all supported IPA versions.
Create enumerated types to identify fields for these IPA registers.
Use IPA_REG_FIELDS() to specify the field mask values defined for
these registers, for each supported version of IPA.
Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be
written to these registers. Remove the definition of the *_FMASK
symbols as well as proc_cntxt_base_addr_encoded(), because they are
no longer needed.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:25 +0000 (17:09 -0500)]
net: ipa: define some more IPA register fields
Define the fields for the SHARED_MEM_SIZE, QSB_MAX_WRITES,
QSB_MAX_READS, FILT_ROUT_HASH_EN, and FILT_ROUT_HASH_FLUSH IPA
registers for all supported IPA versions.
Create enumerated types to identify fields for these registers. Use
IPA_REG_FIELDS() to specify the field mask values defined for these
registers, for each supported version of IPA.
Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be
written to these registers rather than using the *_FMASK
preprocessor symbols.
Remove the definition of the now unused *_FMASK symbols.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:24 +0000 (17:09 -0500)]
net: ipa: define CLKON_CFG and ROUTE IPA register fields
Create the ipa_reg_clkon_cfg_field_id enumerated type, which
identifies the fields for the CLKON_CFG IPA register. Add "CLKON_"
to a few short names to try to avoid name conflicts. Create the
ipa_reg_route_field_id enumerated type, which identifies the fields
for the ROUTE IPA register.
Use IPA_REG_FIELDS() to specify the field mask values defined for
these registers, for each supported version of IPA.
Use ipa_reg_bit() and ipa_reg_encode() to build up the values to be
written to these registers rather than using the *_FMASK
preprocessor symbols.
Remove the definition of the now unused *_FMASK symbols.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:23 +0000 (17:09 -0500)]
net: ipa: define COMP_CFG IPA register fields
Create the ipa_reg_comp_cfg_field_id enumerated type, which
identifies the fields for the COMP_CFG IPA register.
Use IPA_REG_FIELDS() to specify the field mask values defined for
this register, for each supported version of IPA.
Use ipa_reg_bit() to build up the value to be written to this
register rather than using the *_FMASK preprocessor symbols.
Remove the definition of the *_FMASK symbols, along with the inline
functions that were used to encode certain fields whose position
and/or width within the register was dependent on IPA version.
Take this opportunity to represent all one-bit fields using BIT(x)
rather than GENMASK(x, x).
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:22 +0000 (17:09 -0500)]
net: ipa: introduce ipa_reg field masks
Add register field descriptors to the ipa_reg structure. A field in
a register is defined by a field mask, which is a 32-bit mask having
a single contiguous range of bits set.
For each register that has at least one field defined, an enumerated
type will identify the register's fields. The ipa_reg structure for
that register will include an array fmask[] of field masks, indexed
by that enumerated type. Each field mask defines the position and
bit width of a field. An additional "fcount" records how many
fields (masks) are defined for a given register.
Introduce two macros to be used to define registers that have at
least one field.
Introduce a few new functions related to field masks. The first
simply returns a field mask, given an IPA register pointer and field
mask ID. A variant of that is meant to be used for the special case
of single-bit field masks.
Next, ipa_reg_encode(), identifies a field with an IPA register
pointer and a field ID, and takes a value to represent in that
field. The result encodes the value in the appropriate place to be
stored in the register. This is roughly modeled after the bitmask
operations (like u32_encode_bits()).
Another function (ipa_reg_decode()) similarly identifies a register
field, but the value supplied to it represents a full register
value. The value encoded in the field is extracted from the value
and returned. This is also roughly modeled after bitmask operations
(such as u32_get_bits()).
Finally, ipa_reg_field_max() returns the maximum value representable
by a field.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:21 +0000 (17:09 -0500)]
net: ipa: introduce ipa_reg()
Create a new function that returns a register descriptor given its
ID. Change ipa_reg_offset() and ipa_reg_n_offset() so they take a
register descriptor argument rather than an IPA pointer and register
ID. Have them accept null pointers (and return an invalid 0 offset),
to avoid the need for excessive error checking. (A warning is issued
whenever ipa_reg() returns 0).
Call ipa_reg() or ipa_reg_n() to look up information about the
register before calls to ipa_reg_offset() and ipa_reg_n_offset().
Delay looking up offsets until they're needed to read or write
registers.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:20 +0000 (17:09 -0500)]
net: ipa: use ipa_reg[] array for register offsets
Use the array of register descriptors assigned at initialization
time to determine the offset (and where used, stride) for IPA
registers. Issue a warning if an offset is requested for a register
that's not valid for the current system.
Remove all IPE_REG_*_OFFSET macros, as well as inline static
functions that returned register offsets.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Create a new subdirectory "reg", which contains a register
definition file for each supported version of IPA. Each register
definition contains the register's offset, and for parameterized
registers, the stride (distance between consecutive instances of the
register). Finally, it includes an all-caps printable register name.
In these files, each IPA version defines an array of IPA register
definition pointers, with unsupported registers defined with a null
pointer. The array is indexed by the ipa_reg_id enumerated type.
At initialization time, the appropriate register definition array to
use is selected based on the IPA version, and assigned to a new
"regs" field in the IPA structure.
Extend ipa_reg_valid() so it fails if a valid register is not
defined.
This patch simply puts this infrastructure in place; the next will
use it.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:18 +0000 (17:09 -0500)]
net: ipa: use IPA register IDs to determine offsets
Expose two inline functions that return the offset for a register
whose ID is provided; one of them takes an additional argument
that's used for registers that are parameterized. These both use
a common helper function __ipa_reg_offset(), which just uses the
offset symbols already defined.
Replace all references to the offset macros defined for IPA
registers with calls to ipa_reg_offset() or ipa_reg_n_offset().
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Alex Elder [Mon, 26 Sep 2022 22:09:17 +0000 (17:09 -0500)]
net: ipa: introduce IPA register IDs
Create a new ipa_reg_id enumerated type, which identifies each IPA
register with a symbolic identifier. Use short names, but in some
cases (such as "BCR") add "IPA_" to the name to help avoid name
conflicts.
Create two functions that indicate register validity. The first
concisely indicates whether a register is valid for a given version
of IPA, and if so, whether it is defined. The second indicates
whether a register is valid for TX or RX endpoints.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
s390/qeth: Split memcpy() of struct qeth_ipacmd_addr_change flexible array
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), split the memcpy() of the header and the payload
so no false positive run-time overflow warning will be generated.
Pavel Begunkov [Mon, 26 Sep 2022 10:35:36 +0000 (11:35 +0100)]
selftests/net: enable io_uring sendzc testing
443da88a4c8a5 ("selftests/io_uring: test zerocopy send") added io_uring
zerocopy tests but forgot to enable it in make runs. Add missing
io_uring_zerocopy_tx.sh into TEST_PROGS.
====================
devlink: fix order of port and netdev register in drivers
Some of the drivers use wrong order in registering devlink port and
netdev, registering netdev first. That was not intended as the devlink
port is some sort of parent for the netdev. Fix the ordering.
Note that the follow-up patchset is going to make this ordering
mandatory.
====================
NFC: hci: Split memcpy() of struct hcp_message flexible array
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), split the memcpy() of the header and the payload
so no false positive run-time overflow warning will be generated. This
split already existed for the "firstfrag" case, so just generalize the
logic further.
Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220924040835.3364912-1-keescook@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Sun, 25 Sep 2022 14:48:43 +0000 (15:48 +0100)]
net: ethernet: mtk_eth_soc: fix usage of foe_entry_size
As sizeof(hwe->data) can now longer be used as the actual size depends
on foe_entry_size, in commit e17be858137834
("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") the
use of sizeof(hwe->data) is hence replaced.
However, replacing it with ppe->eth->soc->foe_entry_size is wrong as
foe_entry_size represents the size of the whole descriptor and not just
the 'data' field.
Fix this by subtracing the size of the only other field in the struct
'ib1', so we actually end up with the correct size to be copied to the
data field.
Reported-by: Chen Minqiang <ptpt52@gmail.com> Fixes: e17be858137834 ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/YzBqPIgQR2gLrPoK@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Golle [Sun, 25 Sep 2022 14:47:20 +0000 (15:47 +0100)]
net: ethernet: mtk_eth_soc: fix wrong use of new helper function
In function mtk_foe_entry_set_vlan() the call to field accessor macro
FIELD_GET(MTK_FOE_IB1_BIND_VLAN_LAYER, entry->ib1)
has been wrongly replaced by
mtk_prep_ib1_vlan_layer(eth, entry->ib1)
Use correct helper function mtk_get_ib1_vlan_layer instead.
Reported-by: Chen Minqiang <ptpt52@gmail.com> Fixes: 7d6e0fd46916b3 ("net: ethernet: mtk_eth_soc: introduce flow offloading support for mt7986") Signed-off-by: Daniel Golle <daniel@makrotopia.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/YzBp+Kk04CFDys4L@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
net: openvswitch: metering and conntrack in userns
Currently using openvswitch in a non-initial user namespace, e.g., an
unprivileged container, is possible but without metering and conntrack
support. This is due to the restriction of the corresponding Netlink
interfaces to the global CAP_NET_ADMIN.
This simple patches switch from GENL_ADMIN_PERM to GENL_UNS_ADMIN_PERM
in several cases to allow this also for the unprivileged container
use case.
We tested this for unprivileged containers created by the container
manager of GyroidOS (gyroidos.github.io). However, for other container
managers such as LXC or systemd which provide unprivileged containers
this should be apply equally.
====================
Michael Weiß [Fri, 23 Sep 2022 13:38:20 +0000 (15:38 +0200)]
net: openvswitch: allow conntrack in non-initial user namespace
Similar to the previous commit, the Netlink interface of the OVS
conntrack module was restricted to global CAP_NET_ADMIN by using
GENL_ADMIN_PERM. This is changed to GENL_UNS_ADMIN_PERM to support
unprivileged containers in non-initial user namespace.
Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Michael Weiß [Fri, 23 Sep 2022 13:38:19 +0000 (15:38 +0200)]
net: openvswitch: allow metering in non-initial user namespace
The Netlink interface for metering was restricted to global CAP_NET_ADMIN
by using GENL_ADMIN_PERM. To allow metring in a non-inital user namespace,
e.g., a container, this is changed to GENL_UNS_ADMIN_PERM.
Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tony Lu [Thu, 22 Sep 2022 12:19:07 +0000 (20:19 +0800)]
net/smc: Support SO_REUSEPORT
This enables SO_REUSEPORT [1] for clcsock when it is set on smc socket,
so that some applications which uses it can be transparently replaced
with SMC. Also, this helps improve load distribution.
Here is a simple test of NGINX + wrk with SMC. The CPU usage is collected
on NGINX (server) side as below.
====================
net: sunhme: Cleanups and logging improvements
This series is a continuation of [1] with a focus on logging improvements (in
the style of commit f3838b188273 ("net: sunhme: output link status with a single
print.")). I have included several of Rolf's patches in the series where
appropriate (with slight modifications). After this series is applied, many more
messages from this driver will come with driver/device information.
Additionally, most messages (especially debug messages) have been condensed onto
one line (as KERN_CONT messages get split).
Sean Anderson [Sat, 24 Sep 2022 01:53:38 +0000 (21:53 -0400)]
sunhme: Use vdbg for spam-y prints
The SXD, TXD, and RXD macros are used only once (or twice). Just use the
vdbg print, which seems to have been devised for these sorts of very
verbose messages.
Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Sat, 24 Sep 2022 01:53:37 +0000 (21:53 -0400)]
sunhme: Combine continued messages
This driver seems to have been written under the assumption that messages
can be continued arbitrarily. I'm not when this changed (if ever), but such
ad-hoc continuations are liable to be rudely interrupted. Convert all such
instances to single prints. This loses a bit of timing information (such as
when a line was constructed piecemeal as the function executed), but it's
easy to add a few prints if necessary. This also adds newlines to the ends
of any prints without them.
Since (almost every) debug print included the name of the function, include
it automatically.
Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Sat, 24 Sep 2022 01:53:36 +0000 (21:53 -0400)]
sunhme: Use (net)dev_foo wherever possible
Wherever possible, use the associated netdev (or device) when printing
errors or other messages. This makes it immediately clear what device
caused the error, and provides more information than just the device name.
Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Sat, 24 Sep 2022 01:53:35 +0000 (21:53 -0400)]
sunhme: Convert printk(KERN_FOO ...) to pr_foo(...)
This is a mostly-mechanical translation of the existing printks into
pr_foos. In several places, I have pasted messages which were broken over
several lines to allow for easier grepping.
Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Sat, 24 Sep 2022 01:53:34 +0000 (21:53 -0400)]
sunhme: Clean up debug infrastructure
Remove all the single-use debug conditionals, and just collect the debug
defines at the top of the file. HMD seems like it is used for general debug
info, so just redefine it as pr_debug. Additionally, instead of using the
default loglevel, use the debug loglevel for debugging.
Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Sat, 24 Sep 2022 01:53:30 +0000 (21:53 -0400)]
sunhme: Return an ERR_PTR from quattro_pci_find
In order to differentiate between a missing bridge and an OOM condition,
return ERR_PTRs from quattro_pci_find. This also does some general linting
in the area.
Signed-off-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sean Anderson [Sat, 24 Sep 2022 01:53:28 +0000 (21:53 -0400)]
sunhme: Remove version
Module versions are not very useful:
> The basic problem is, the version string does not identify the sources
> with enough accuracy. It says nothing about back ported fixes in
> stable kernels. It tells you nothing about vendor patches to the
> network core, etc.
This patchset https://lore.kernel.org/all/20220921140524.3831101-8-yangyingliang@huawei.com/T/
removed all set_drvdata(NULL) in driver remove function.
i2c_set_clientdata() is another wrapper of set drvdata function, to follow
the same convention, remove i2c_set_clientdata() called in driver remove
function in drivers/net/dsa/.
====================
xdp: Adjust xdp_frame layout to avoid using bitfields
Practical experience (and advice from Alexei) tell us that bitfields in
structs lead to un-optimized assembly code. I've verified this change
does lead to better x86_64 assembly, both via objdump and playing with
code snippets in godbolt.org.
Using scripts/bloat-o-meter shows the code size is reduced with 24
bytes for xdp_convert_buff_to_frame() that gets inlined e.g. in
i40e_xmit_xdp_tx_ring() which were used for microbenchmarking.
Microbenchmarking results do show improvements, but very small and
varying between 0.5 to 2 nanosec improvement per packet.
The member @metasize is changed from u8 to u32. Future users of this
area could split this into two u16 fields. I've also benchmarked with
two u16 fields showing equal performance gains and code size reduction.
The moved member @frame_sz doesn't change sizeof struct due to existing
padding. Like xdp_buff member @frame_sz is placed next to @flags, which
allows compiler to optimize assignment of these.
====================
Improve tsn_lib selftests for future distributed tasks
Some of the boards I am working with are limited in the number of ports
that they offer, and as more TSN related selftests are added, it is
important to be able to distribute the work among multiple boards.
A large part of implementing that is ensuring network-wide
synchronization, but also permitting more streams of data to flow
through the network. There is the more important aspect of also
coordinating the timing characteristics of those streams, and that is
also something that is tackled, although not in this modest patch set.
The goal here is not to introduce new selftests yet, but just to lay a
better foundation for them. These patches are a part of the cleanup work
I've done while working on selftests for frame preemption. They are
regression-tested with psfp.sh.
====================
Vladimir Oltean [Fri, 23 Sep 2022 21:00:15 +0000 (00:00 +0300)]
selftests: net: tsn_lib: run phc2sys in automatic mode
We can make the phc2sys helper not only synchronize a PHC to
CLOCK_REALTIME, which is what it currently does, but also CLOCK_REALTIME
to a PHC, which is going to be needed in distributed TSN tests.
Instead of making the complexity of the arguments passed to
phc2sys_start() explode, we can let it figure out the sync direction
automatically, based on ptp4l's port states.
Towards that goal, pass just the path to the desired ptp4l instance's
UNIX domain socket, and remove the $if_name argument (from which it
derives the PHC). Also adapt the one caller from the ocelot psfp.sh
test. In the case of psfp.sh, phc2sys_start is able to properly figure
out that CLOCK_REALTIME is the source clock and swp1's PHC is the
destination, because of the way in which ptp4l_start for the
UDS_ADDRESS_SWP1 was called: with slave_only=false, so it will always
win the BMCA and always become the sync master between itself and $h1.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move the PID variable for the isochron receiver into a separate
namespace per stats port, to allow multiple receivers (and/or
orchestration daemons) to be instantiated by the same script.
Preserve the existing behavior by making isochron_do() use the default
stats TCP port of 5000.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 23 Sep 2022 21:00:13 +0000 (00:00 +0300)]
selftests: net: tsn_lib: allow running ptp4l on multiple interfaces
Switch ports will want to act as Boundary Clocks, which are configured
using ptp4l by specifying the "-i" argument multiple times.
Since we track a log file and a pid file for each ptp4l instance, and we
want to be compatible with the existing single-port callers of
ptp4l_start and ptp4l_stop, pass the interface list as a single string
of space-separated values. Based on this, we create a label for each
ptp4l instance, where the spaces are replaced with underscores
(ptp4l_start "eth0 eth1" generates "ptp4l_pid_eth0_eth1").
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>