Use the new mcast querier state dump infrastructure and export vlans'
mcast context querier state embedded in attribute
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping global IPv6 querier state, we dump the state
only if our own querier is enabled or there has been another external
querier which has won the election. For the bridge global state we use
a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside.
The structure is:
[IFLA_BR_MCAST_QUERIER_STATE]
`[BRIDGE_QUERIER_IPV6_ADDRESS] - ip address of the querier
`[BRIDGE_QUERIER_IPV6_PORT] - bridge port ifindex where the querier
was seen (set only if external querier)
`[BRIDGE_QUERIER_IPV6_OTHER_TIMER] - other querier timeout
IPv4 and IPv6 attributes are embedded at the same level of
IFLA_BR_MCAST_QUERIER_STATE. If we didn't dump anything we cancel the nest
and return.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dumping global IPv4 querier state, we dump the state
only if our own querier is enabled or there has been another external
querier which has won the election. For the bridge global state we use
a new attribute IFLA_BR_MCAST_QUERIER_STATE and embed the state inside.
The structure is:
[IFLA_BR_MCAST_QUERIER_STATE]
`[BRIDGE_QUERIER_IP_ADDRESS] - ip address of the querier
`[BRIDGE_QUERIER_IP_PORT] - bridge port ifindex where the querier was
seen (set only if external querier)
`[BRIDGE_QUERIER_IP_OTHER_TIMER] - other querier timeout
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bridge: mcast: make sure querier port/address updates are consistent
Use a sequence counter to make sure port/address updates can be read
consistently without requiring the bridge multicast_lock. We need to
zero out the port and address when the other querier has expired and
we're about to select ourselves as querier. br_multicast_read_querier
will be used later when dumping querier state. Updates are done only
with the multicast spinlock and softirqs disabled, while reads are done
from process context and from softirqs (due to notifications).
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bridge: mcast: record querier port device ifindex instead of pointer
Currently when a querier port is detected its net_bridge_port pointer is
recorded, but it's used only for comparisons so it's fine to have stale
pointer, in order to dereference and use the port pointer a proper
accounting of its usage must be implemented adding unnecessary
complexity. To solve the problem we can just store the netdevice ifindex
instead of the port pointer and retrieve the bridge port. It is a best
effort and the device needs to be validated that is still part of that
bridge before use, but that is small price to pay for avoiding querier
reference counting for each port/vlan.
Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Aug 2021 12:59:10 +0000 (13:59 +0100)]
Merge branch 'devlink-cleanup-for-delay-event'
Leon Romanovsky says:
====================
Devlink cleanup for delay event series
Jakub's request to make sure that devlink events are delayed and not
printed till they fully accessible [1] requires us to implement delayed
event notification system in the devlink.
In order to do it, I moved some of my patches (xarray e.t.c) from the future
series to be before "Move devlink_register to be near devlink_reload_enable" [2].
That allows us to rely on DEVLINK_REGISTERED xarray mark to decide if to print
event or not.
Other patches are simple cleanup which is needed anyway.
Next in the queue:
* Delay event series
* Move devlink_register to be near devlink_reload_enable"
* Extension of devlink_ops to be set dynamically
* devlink_reload_* delete
* Devlink locks rework to user xarray and reference counting
* ????
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The devlink pointer always exists after hclge_devlink_init() succeed.
Remove that check together with NULL setting after release and ensure
that devlink_register is last command prior to call to devlink_reload_enable().
Fixes: b741269b2759 ("net: hns3: add support for registering devlink for PF") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 14 Aug 2021 09:57:30 +0000 (12:57 +0300)]
devlink: Clear whole devlink_flash_notify struct
The { 0 } doesn't clear all fields in the struct, but tells to the
compiler to set all fields to zero and doesn't touch any sub-fields
if they exists.
The {} is an empty initialiser that instructs to fully initialize whole
struct including sub-fields, which is error-prone for future
devlink_flash_notify extensions.
Fixes: 6700acc5f1fe ("devlink: collect flash notify params into a struct") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 14 Aug 2021 09:57:29 +0000 (12:57 +0300)]
devlink: Use xarray to store devlink instances
We can use xarray instead of linearly organized linked lists for the
devlink instances. This will let us revise the locking scheme in favour
of internal xarray locking that protects database.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 14 Aug 2021 09:57:28 +0000 (12:57 +0300)]
devlink: Count struct devlink consumers
The struct devlink itself is protected by internal lock and doesn't
need global lock during operation. That global lock is used to protect
addition/removal new devlink instances from the global list in use by
all devlink consumers in the system.
The future conversion of linked list to be xarray will allow us to
actually delete that lock, but first we need to count all struct devlink
users.
The reference counting provides us a way to ensure that no new user
space commands success to grab devlink instance which is going to be
destroyed makes it is safe to access it without lock.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Romanovsky [Sat, 14 Aug 2021 09:57:26 +0000 (12:57 +0300)]
devlink: Simplify devlink_pernet_pre_exit call
The devlink_pernet_pre_exit() will be called if net namespace exits.
That routine is relevant for devlink instances that were assigned to
that namespaces first. This assignment is possible only with the following
command: "devlink reload DEV netns ...", which already checks reload support.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 14 Aug 2021 10:37:25 +0000 (11:37 +0100)]
Merge branch 'mptcp-improve-backup-subflows'
Mat Martineau says:
====================
mptcp: Improve use of backup subflows
Multipath TCP combines multiple TCP subflows in to one stream, and the
MPTCP-level socket must decide which subflow to use when sending (or
resending) chunks of data. The choice of the "best" subflow to transmit
on can vary depending on the priority (normal or backup) for each
subflow and how well the subflow is performing.
In order to improve MPTCP performance when some subflows are failing,
this patch set changes how backup subflows are utilized and introduces
tracking of "stale" subflows that are still connected but not making
progress.
Patch 1 adjusts MPTCP-level retransmit timeouts to use data from all
subflows.
Patch 2 makes MPTCP-level retransmissions less aggressive to avoid
resending data that's still queued at the TCP level.
Patch 3 changes the way pending data is handled when subflows are
closed. Unacked MPTCP-level data still in the subflow tx queue is
immediately moved to another subflow for transmission instead of waiting
for MPTCP-level timeouts to trigger retransmission.
Patch 4 has some sysctl code cleanup.
Patches 5 and 6 add tracking of "stale" subflows, so only underlying TCP
subflow connections that appear to be making progress are considered
when selecting a subflow to (re)transmit data. How fast a subflow goes
stale is configurable with a per-namespace sysctl. Related MIBS are
added too.
Patch 7 makes sure the backup flag is always correctly recorded when the
MP_JOIN SYN/ACK is received for an added subflow.
Patch 8 adds more test cases for backup subflows and stale subflows.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:48 +0000 (15:15 -0700)]
selftests: mptcp: add testcase for active-back
Add more test-case for link failures scenario,
including recovery from link failure using only
backup subflows and bi-directional transfer.
Additionally explicitly check for stale count
Co-developed-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:47 +0000 (15:15 -0700)]
mptcp: backup flag from incoming MPJ ack option
the parsed incoming backup flag is not propagated
to the subflow itself, the client may end-up using it
to send data.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/191 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:46 +0000 (15:15 -0700)]
mptcp: add mibs for stale subflows processing
This allows monitoring exceptional events like
active backup scenarios.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:45 +0000 (15:15 -0700)]
mptcp: faster active backup recovery
The msk can use backup subflows to transmit in-sequence data
only if there are no other active subflow. On active backup
scenario, the MPTCP connection can do forward progress only
due to MPTCP retransmissions - rtx can pick backup subflows.
This patch introduces a new flag flow MPTCP subflows: if the
underlying TCP connection made no progresses for long time,
and there are other less problematic subflows available, the
given subflow become stale.
Stale subflows are not considered active: if all non backup
subflows become stale, the MPTCP scheduler can pick backup
subflows for plain transmissions.
Stale subflows can return in active state, as soon as any reply
from the peer is observed.
Active backup scenarios can now leverage the available b/w
with no restrinction.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:44 +0000 (15:15 -0700)]
mptcp: cleanup sysctl data and helpers
Reorder the data in mptcp_pernet to avoid wasting space
with no reasons and constify the access helpers.
No functional changes intended.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:43 +0000 (15:15 -0700)]
mptcp: handle pending data on closed subflow
The PM can close active subflow, e.g. due to ingress RM_ADDR
option. Such subflow could carry data still unacked at the
MPTCP-level, both in the write and the rtx_queue, which has
never reached the other peer.
Currently the mptcp-level retransmission will deliver such data,
but at a very low rate (at most 1 DSM for each MPTCP rtx interval).
We can speed-up the recovery a lot, moving all the unacked in the
tcp write_queue, so that it will be pushed again via other
subflows, at the speed allowed by them.
Also make available the new helper for later patches.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:42 +0000 (15:15 -0700)]
mptcp: less aggressive retransmission strategy
The current mptcp re-inject strategy is very aggressive,
we have mptcp-level retransmissions even on single subflow
connection, if the link in-use is lossy.
Let's be a little more conservative: we do retransmit
only if at least a subflow has write and rtx queue empty.
Additionally use the backup subflows only if the active
subflows are stale - no progresses in at least an rtx period
and ignore stale subflows for rtx timeout update
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/207 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 13 Aug 2021 22:15:41 +0000 (15:15 -0700)]
mptcp: more accurate timeout
As reported by Maxim, we have a lot of MPTCP-level
retransmissions when multilple links with different latencies
are in use.
This patch refactor the mptcp-level timeout accounting so that
the maximum of all the active subflow timeout is used. To avoid
traversing the subflow list multiple times, the update is
performed inside the packet scheduler.
Additionally clean-up a bit timeout handling.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 12 Aug 2021 18:33:58 +0000 (20:33 +0200)]
ethernet: fix PTP_1588_CLOCK dependencies
The 'imply' keyword does not do what most people think it does, it only
politely asks Kconfig to turn on another symbol, but does not prevent
it from being disabled manually or built as a loadable module when the
user is built-in. In the ICE driver, the latter now causes a link failure:
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_eth_ioctl':
ice_main.c:(.text+0x13b0): undefined reference to `ice_ptp_get_ts_config'
ice_main.c:(.text+0x13b0): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_get_ts_config'
aarch64-linux-ld: ice_main.c:(.text+0x13bc): undefined reference to `ice_ptp_set_ts_config'
ice_main.c:(.text+0x13bc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_set_ts_config'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_prepare_for_reset':
ice_main.c:(.text+0x31fc): undefined reference to `ice_ptp_release'
ice_main.c:(.text+0x31fc): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `ice_ptp_release'
aarch64-linux-ld: drivers/net/ethernet/intel/ice/ice_main.o: in function `ice_rebuild':
This is a recurring problem in many drivers, and we have discussed
it several times befores, without reaching a consensus. I'm providing
a link to the previous email thread for reference, which discusses
some related problems.
To solve the dependency issue better than the 'imply' keyword, introduce a
separate Kconfig symbol "CONFIG_PTP_1588_CLOCK_OPTIONAL" that any driver
can depend on if it is able to use PTP support when available, but works
fine without it. Whenever CONFIG_PTP_1588_CLOCK=m, those drivers are
then prevented from being built-in, the same way as with a 'depends on
PTP_1588_CLOCK || !PTP_1588_CLOCK' dependency that does the same trick,
but that can be rather confusing when you first see it.
Since this should cover the dependencies correctly, the IS_REACHABLE()
hack in the header is no longer needed now, and can be turned back
into a normal IS_ENABLED() check. Any driver that gets the dependency
wrong will now cause a link time failure rather than being unable to use
PTP support when that is in a loadable module.
However, the two recently added ptp_get_vclocks_index() and
ptp_convert_timestamp() interfaces are only called from builtin code with
ethtool and socket timestamps, so keep the current behavior by stubbing
those out completely when PTP is in a loadable module. This should be
addressed properly in a follow-up.
As Richard suggested, we may want to actually turn PTP support into a
'bool' option later on, preventing it from being a loadable module
altogether, which would be one way to solve the problem with the ethtool
interface.
Add support for SFP cages connected to the Marvell 88E1512 transceiver. 88E1512 supports for SGMII/1000Base-X/100Base-FX media type with RGMII
on system interface. Configure PHY to appropriate mode depending on the
type of SFP inserted. On SFP removal configure PHY to the RGMII-copper
mode so RJ-45 port can still work.
Jakub Kicinski [Fri, 13 Aug 2021 23:30:37 +0000 (16:30 -0700)]
Merge branch 'kconfig-symbol-clean-up-on-net'
Lukas Bulwahn says:
====================
Kconfig symbol clean-up on net
The script ./scripts/checkkconfigsymbols.py warns on invalid references to
Kconfig symbols (often, minor typos, name confusions or outdated references).
This patch series addresses all issues reported by
./scripts/checkkconfigsymbols.py in ./net/ and ./drivers/net/ for Kconfig
and Makefile files. Issues in the Kconfig and Makefile files indicate some
shortcomings in the overall build definitions, and often are true actionable
issues to address.
These issues can be identified and filtered by:
./scripts/checkkconfigsymbols.py \
| grep -E "(drivers/)?net/.*(Kconfig|Makefile)" -B 1 -A 1
After applying this patch series on linux-next (next-20210811), the command
above yields no further issues to address.
====================
Lukas Bulwahn [Thu, 12 Aug 2021 08:38:05 +0000 (10:38 +0200)]
net: 802: remove dead leftover after ipx driver removal
Commit 7a2e838d28cf ("staging: ipx: delete it from the tree") removes the
ipx driver and the config IPX. Since then, there is some dead leftover in
./net/802/, that was once used by the IPX driver, but has no other user.
Remove this dead leftover.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Bulwahn [Thu, 12 Aug 2021 08:38:04 +0000 (10:38 +0200)]
net: Kconfig: remove obsolete reference to config MICROBLAZE_64K_PAGES
Commit 05cdf457477d ("microblaze: Remove noMMU code") removes config
MICROBLAZE_64K_PAGES in arch/microblaze/Kconfig. However, there is still
a reference to MICROBLAZE_64K_PAGES in the config VMXNET3 in
./drivers/net/Kconfig.
Remove this obsolete reference to config MICROBLAZE_64K_PAGES.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Joakim Zhang [Thu, 12 Aug 2021 07:09:48 +0000 (15:09 +0800)]
net: fec: add WoL support for i.MX8MQ
By default FEC driver treat irq[0] (i.e. int0 described in dt-binding) as
wakeup interrupt, but this situation changed on i.MX8M serials, SoC
integration guys mix wakeup interrupt signal into int2 interrupt line.
This patch introduces FEC_QUIRK_WAKEUP_FROM_INT2 to indicate int2 as wakeup
interrupt for i.MX8MQ.
ravb: Remove checks for unsupported internal delay modes
The EtherAVB instances on the R-Car E3/D3 and RZ/G2E SoCs do not support
TX clock internal delay modes, and the EtherAVB driver prints a warning
if an unsupported "rgmii-*id" PHY mode is specified, to catch buggy
DTBs.
Commit a6f51f2efa742df0 ("ravb: Add support for explicit internal
clock delay configuration") deprecated deriving the internal delay mode
from the PHY mode, in favor of explicit configuration using the now
mandatory "rx-internal-delay-ps" and "tx-internal-delay-ps" properties,
thus delegating the warning to the legacy fallback code.
Since explicit configuration of a (valid) internal clock delay
configuration is enforced by validating device tree source files against
DT binding files, and all upstream DTS files have been converted as of
commit a5200e63af57d05e ("arm64: dts: renesas: rzg2: Convert EtherAVB to
explicit delay handling"), the checks in the legacy fallback code can be
removed.
Jussi Maki [Thu, 12 Aug 2021 14:52:41 +0000 (14:52 +0000)]
net, bonding: Disallow vlan+srcmac with XDP
The new vlan+srcmac xmit policy is not implementable with XDP since
in many cases the 802.1Q payload is not present in the packet. This
can be for example due to hardware offload or in the case of veth
due to use of skbuffs internally.
This also fixes the NULL deref with the vlan+srcmac xmit policy
reported by Jonathan Toppins by additionally checking the skb
pointer.
Fixes: a815bde56b15 ("net, bonding: Refactor bond_xmit_hash for use with xdp_buff") Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jussi Maki <joamaki@gmail.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20210812145241.12449-1-joamaki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h 9e26680733d5 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp") 9e518f25802c ("bnxt_en: 1PPS functions to configure TSIO pins") 099fdeda659d ("bnxt_en: Event handler for PPS events")
kernel/bpf/helpers.c
include/linux/bpf-cgroup.h a2baf4e8bb0f ("bpf: Fix potentially incorrect results with bpf_get_local_storage()") c7603cfa04e7 ("bpf: Add ambient BPF runtime context stored in current")
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c 5957cc557dc5 ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray") 2d0b41a37679 ("net/mlx5: Refcount mlx5_irq with integer")
MAINTAINERS 7b637cd52f02 ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo") 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver")
Linus Torvalds [Fri, 13 Aug 2021 02:24:03 +0000 (16:24 -1000)]
Merge tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Networking fixes, including fixes from netfilter, bpf, can and
ieee802154.
The size of this is pretty normal, but we got more fixes for 5.14
changes this week than last week. Nothing major but the trend is the
opposite of what we like. We'll see how the next week goes..
Current release - regressions:
- r8169: fix ASPM-related link-up regressions
- bridge: fix flags interpretation for extern learn fdb entries
- phy: micrel: fix link detection on ksz87xx switch
- Revert "tipc: Return the correct errno code"
- ptp: fix possible memory leak caused by invalid cast
Current release - new code bugs:
- bpf: add missing bpf_read_[un]lock_trace() for syscall program
- bpf: fix potentially incorrect results with bpf_get_local_storage()
- page_pool: mask the page->signature before the checking, avoid dma
mapping leaks
- netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
- bnxt_en: fix firmware interface issues with PTP
- mlx5: Bridge, fix ageing time
Previous releases - regressions:
- linkwatch: fix failure to restore device state across
suspend/resume
- bareudp: fix invalid read beyond skb's linear data
Previous releases - always broken:
- bpf: fix integer overflow involving bucket_size
- ppp: fix issues when desired interface name is specified via
netlink
- wwan: mhi_wwan_ctrl: fix possible deadlock
- dsa: microchip: ksz8795: fix number of VLAN related bugs
- dsa: drivers: fix broken backpressure in .port_fdb_dump
- dsa: qca: ar9331: make proper initial port defaults
Misc:
- bpf: add lockdown check for probe_write_user helper
- netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
out
- netfilter: conntrack: collect all entries in one cycle,
heuristically slow down garbage collection scans on idle systems to
prevent frequent wake ups"
* tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
vsock/virtio: avoid potential deadlock when vsock device remove
wwan: core: Avoid returning NULL from wwan_create_dev()
net: dsa: sja1105: unregister the MDIO buses during teardown
Revert "tipc: Return the correct errno code"
net: mscc: Fix non-GPL export of regmap APIs
net: igmp: increase size of mr_ifc_count
MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
net: pcs: xpcs: fix error handling on failed to allocate memory
net: linkwatch: fix failure to restore device state across suspend/resume
net: bridge: fix memleak in br_add_if()
net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
net: bridge: fix flags interpretation for extern learn fdb entries
net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
bpf, core: Fix kernel-doc notation
net: igmp: fix data-race in igmp_ifc_timer_expire()
net: Fix memory leak in ieee802154_raw_deliver
...
Linus Torvalds [Fri, 13 Aug 2021 02:16:01 +0000 (16:16 -1000)]
Merge tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis
and a reference handling fix from Jeff that should address some memory
corruption reports in the snaprealm area.
Both marked for stable"
* tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client:
ceph: take snap_empty_lock atomically with snaprealm refcount change
ceph: reduce contention in ceph_check_delayed_caps()
i915:
- GVT fix for Windows VM hang.
- Display fix of 12 BPC bits for display 12 and newer.
- Don't try to access some media register for fused off domains.
- Fix kerneldoc build warnings.
* tag 'drm-fixes-2021-08-13' of git://anongit.freedesktop.org/drm/drm:
drm/doc/rfc: drop lmem uapi section
drm/i915: Only access SFC_DONE when media domain is not fused off
drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg
drm/amd/display: use GFP_ATOMIC in amdgpu_dm_irq_schedule_work
drm/amd/display: Remove invalid assert for ODM + MPC case
drm/amd/pm: bug fix for the runtime pm BACO
drm/amdgpu: handle VCN instances when harvesting (v2)
drm/meson: fix colour distortion from HDR set during vendor u-boot
drm/i915/gvt: Fix cached atomics setting for Windows VM
drm/amdgpu: Add preferred mode in modeset when freesync video mode's enabled.
drm/amd/pm: Fix a memory leak in an error handling path in 'vangogh_tables_init()'
drm/amdgpu: don't enable baco on boco platforms in runpm
drm/amdgpu: set RAS EEPROM address from VBIOS
drm/amd/pm: update smu v13.0.1 firmware header
drm/mediatek: Fix cursor plane no update
drm/mediatek: mtk-dpi: Set out_fmt from config if not the last bridge
drm/mediatek: dpi: Fix NULL dereference in mtk_dpi_bridge_atomic_check
Alex Elder [Wed, 11 Aug 2021 14:18:02 +0000 (09:18 -0500)]
dt-bindings: net: qcom,ipa: make imem interconnect optional
On some newer SoCs, the interconnect between IPA and SoC internal
memory (imem) is not used. Update the binding to indicate that
having just the memory and config interconnects is another allowed
configuration.
It isn't required, but all callers of ipa_aggr_granularity_val()
pass a constant value (IPA_AGGR_GRANULARITY) as the usec argument.
Two of those callers are in ipa_validate_build(), with the result
being passed to BUILD_BUG_ON().
Evidently the "sparc64-linux-gcc" compiler (at least) doesn't always
inline ipa_aggr_granularity_val(), so the result of the function is
not constant at compile time, and that leads to build errors.
Define the function with the __always_inline attribute to avoid the
errors. We can see by inspection that the value passed is never
zero, so we can just remove its WARN_ON() call.
Fixes: 5bc5588466a1f ("net: ipa: use WARN_ON() rather than assertions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20210811135948.2634264-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Dave Airlie [Thu, 12 Aug 2021 20:29:12 +0000 (06:29 +1000)]
Merge tag 'drm-intel-fixes-2021-08-12' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- GVT fix for Windows VM hang.
- Display fix of 12 BPC bits for display 12 and newer.
- Don't try to access some media register for fused off domains.
- Fix kerneldoc build warnings.
Jakub Kicinski [Thu, 12 Aug 2021 18:50:16 +0000 (11:50 -0700)]
Merge tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
ieee802154 for net 2021-08-12
Mostly fixes coming from bot reports. Dongliang Mu tackled some syzkaller
reports in hwsim again and Takeshi Misawa a memory leak in ieee802154 raw.
* tag 'ieee802154-for-davem-2021-08-12' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan:
net: Fix memory leak in ieee802154_raw_deliver
ieee802154: hwsim: fix GPF in hwsim_new_edge_nl
ieee802154: hwsim: fix GPF in hwsim_set_edge_lqi
====================
lock_sock() may do initiative schedule when the 'sk' is owned by
other thread at the same time, we would receivce a warning message
that "scheduling while atomic".
Even worse, if the next task (selected by the scheduler) try to
release a 'sk', it need to request vsock_table_lock and the deadlock
occur, cause the system into softlockup state.
Call trace:
queued_spin_lock_slowpath
vsock_remove_bound
vsock_remove_sock
virtio_transport_release
__vsock_release
vsock_release
__sock_release
sock_close
__fput
____fput
So we should not require sk_lock in this case, just like the behavior
in vhost_vsock or vmci.
Linus Torvalds [Thu, 12 Aug 2021 17:20:16 +0000 (07:20 -1000)]
Merge branch 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucounts fix from Eric Biederman:
"This fixes the ucount sysctls on big endian architectures.
The counts were expanded to be longs instead of ints, and the sysctl
code was overlooked, so only the low 32bit were being processed. On
litte endian just processing the low 32bits is fine, but on 64bit big
endian processing just the low 32bits results in the high order bits
instead of the low order bits being processed and nothing works
proper.
This change took a little bit to mature as we have the SYSCTL_ZERO,
and SYSCTL_INT_MAX macros that are only usable for sysctls operating
on ints, but unfortunately are not obviously broken. Which resulted in
the versions of this change working on big endian and not on little
endian, because the int SYSCTL_ZERO when extended 64bit wound up being
0x100000000. So we only allowed values greater than 0x100000000 and
less than 0faff. Which unfortunately broken everything that tried to
set the sysctls. (First reported with the windows subsystem for
linux).
I have tested this on x86_64 64bit after first reproducing the
problems with the earlier version of this change, and then verifying
the problems do not exist when we use appropriate long min and max
values for extra1 and extra2"
* 'for-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ucounts: add missing data type changes
Andy Shevchenko [Wed, 11 Aug 2021 12:48:45 +0000 (15:48 +0300)]
wwan: core: Avoid returning NULL from wwan_create_dev()
Make wwan_create_dev() to return either valid or error pointer,
In some cases it may return NULL. Prevent this by converting
it to the respective error pointer.
Fixes: 9a44c1cc6388 ("net: Add a WWAN subsystem") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Link: https://lore.kernel.org/r/20210811124845.10955-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
David S. Miller [Thu, 12 Aug 2021 11:45:41 +0000 (12:45 +0100)]
Merge tag 'mlx5-updates-2021-08-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 updates 2021-08-11
This series provides misc updates to mlx5.
For more information please see tag log below.
Please pull and let me know if there is any problem.
mlx5-updates-2021-08-11
Misc. cleanup for mlx5.
1) Typos and use of netdev_warn()
2) smatch cleanup
3) Minor fix to inner TTC table creation
4) Dynamic capability cache allocation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Aug 2021 10:46:21 +0000 (11:46 +0100)]
Merge branch 'dsa-cross-chip-notifiers'
Vladimir Oltean says:
====================
Improvements to the DSA tag_8021q cross-chip notifiers
This series improves cross-chip notifier error messages and addresses a
benign error message seen during reboot on a system with disjoint DSA
trees.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 11 Aug 2021 13:46:06 +0000 (16:46 +0300)]
net: dsa: tag_8021q: don't broadcast during setup/teardown
Currently, on my board with multiple sja1105 switches in disjoint trees
described in commit f66a6a69f97a ("net: dsa: permit cross-chip bridging
between all trees in the system"), rebooting the board triggers the
following benign warnings:
[ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT
[ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT
[ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT
[ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT
[ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT
[ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT
Basically switch 1 calls dsa_tag_8021q_unregister, and switch 1's TX and
RX VLANs cannot be found on switch 2's CPU port.
But why would switch 2 even attempt to delete switch 1's TX and RX
tag_8021q VLANs from its CPU port? Well, because we use dsa_broadcast,
and it is supposed that it had added those VLANs in the first place
(because in dsa_port_tag_8021q_vlan_match, all CPU ports match
regardless of their tree index or switch index).
The two trees probe asynchronously, and when switch 1 probed, it called
dsa_broadcast which did not notify the tree of switch 2, because that
didn't probe yet. But during unbind, switch 2's tree _is_ probed, so it
_is_ notified of the deletion.
Before jumping to introduce a synchronization mechanism between the
probing across disjoint switch trees, let's take a step back and see
whether we _need_ to do that in the first place.
The RX and TX VLANs of switch 1 would be needed on switch 2's CPU port
only if switch 1 and 2 were part of a cross-chip bridge. And
dsa_tag_8021q_bridge_join takes care precisely of that (but if probing
was synchronous, the bridge_join would just end up bumping the VLANs'
refcount, because they are already installed by the setup path).
Since by the time the ports are bridged, all DSA trees are already set
up, and we don't need the tag_8021q VLANs of one switch installed on the
other switches during probe time, the answer is that we don't need to
fix the synchronization issue.
So make the setup and teardown code paths call dsa_port_notify, which
notifies only the local tree, and the bridge code paths call
dsa_broadcast, which let the other trees know as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 11 Aug 2021 13:46:05 +0000 (16:46 +0300)]
net: dsa: print more information when a cross-chip notifier fails
Currently this error message does not say a lot:
[ 32.693498] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
[ 32.699725] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
[ 32.705931] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
[ 32.712139] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
[ 32.718347] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
[ 32.724554] DSA: failed to notify tag_8021q VLAN deletion: -ENOENT
but in this form, it is immediately obvious (at least to me) what the
problem is, even without further looking at the code:
[ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT
[ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT
[ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT
[ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT
[ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT
[ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Vetter [Tue, 10 Aug 2021 14:27:48 +0000 (16:27 +0200)]
drm/doc/rfc: drop lmem uapi section
We still have quite a bit more work to do with overall reworking of
the ttm-based dg1 code, but the uapi stuff is now finalized with the
latest pull. So remove that.
This also fixes kerneldoc build warnings because we've included the
same headers in two places, resulting in sphinx complaining about
duplicated symbols. This regression has been created when we moved the
uapi definitions to the real include/uapi/ folder in 727ecd99a4c9
("drm/doc/rfc: drop the i915_gem_lmem.h header")
v2: Fix a few references that I missed, the htmldocs build took
forever.
Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Tested-by Stephen Rothwell <sfr@canb.auug.org.au> (v1)
References: https://lore.kernel.org/dri-devel/20210603193242.1ce99344@canb.auug.org.au/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 727ecd99a4c9 ("drm/doc/rfc: drop the i915_gem_lmem.h header") Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210810142748.1983271-1-daniel.vetter@ffwll.ch
(cherry picked from commit dae2d28832968751f7731336b560a4a84a197b76) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Matt Roper [Fri, 6 Aug 2021 17:41:30 +0000 (10:41 -0700)]
drm/i915: Only access SFC_DONE when media domain is not fused off
The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6
forcewake domain and is not accessible if the vdbox in that domain is
fused off and the forcewake is not initialized.
This mistake went unnoticed because until recently we were using the
wrong register offset for the SFC_DONE register; once the register
offset was corrected, we started hitting errors like
Andy Shevchenko [Wed, 11 Aug 2021 13:39:32 +0000 (16:39 +0300)]
wwan: core: Unshadow error code returned by ida_alloc_range()
ida_alloc_range() may return other than -ENOMEM error code.
Unshadow it in the wwan_create_port().
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Reviewed-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ankit Nautiyal [Wed, 11 Aug 2021 05:18:57 +0000 (10:48 +0530)]
drm/i915/display: Fix the 12 BPC bits for PIPE_MISC reg
Till DISPLAY12 the PIPE_MISC bits 5-7 are used to set the
Dithering BPC, with valid values of 6, 8, 10 BPC.
For ADLP+ these bits are used to set the PORT OUTPUT BPC, with valid
values of: 6, 8, 10, 12 BPC, and need to be programmed whether
dithering is enabled or not.
This patch:
-corrects the bits 5-7 for PIPE MISC register for 12 BPC.
-renames the bits and mask to have generic names for these bits for
dithering bpc and port output bpc.
v3: Added a note for MIPI DSI which uses the PIPE_MISC for readout
for pipe_bpp. (Uma Shankar)
v2: Added 'display' to the subject and fixes tag. (Uma Shankar)
Fixes: 756f85cffef2 ("drm/i915/bdw: Broadwell has PIPEMISC") Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> (v1) Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v3.13+ Signed-off-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Uma Shankar <uma.shankar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210811051857.109723-1-ankit.k.nautiyal@intel.com
(cherry picked from commit 70418a68713c13da3f36c388087d0220b456a430) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Vladimir Oltean [Wed, 11 Aug 2021 11:59:45 +0000 (14:59 +0300)]
net: dsa: sja1105: unregister the MDIO buses during teardown
The call to sja1105_mdiobus_unregister is present in the error path but
absent from the main driver unbind path.
Fixes: 5a8f09748ee7 ("net: dsa: sja1105: register the MDIO buses for 100base-T1 and 100base-TX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
DENG Qingfang [Wed, 11 Aug 2021 09:50:43 +0000 (17:50 +0800)]
net: dsa: mt7530: fix VLAN traffic leaks again
When a port leaves a VLAN-aware bridge, the current code does not clear
other ports' matrix field bit. If the bridge is later set to VLAN-unaware
mode, traffic in the bridge may leak to that port.
Remove the VLAN filtering check in mt7530_port_bridge_leave.
Fixes: 474a2ddaa192 ("net: dsa: mt7530: fix VLAN traffic leaks") Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 12 Aug 2021 09:50:33 +0000 (10:50 +0100)]
Merge branch 'pktgen-imix'
Nick Richardson says:
====================
pktgen: Add IMIX mode
Adds internet mix (IMIX) mode to pktgen. Internet mix is
included in many user-space network perf testing tools. It allows
for the user to specify a distribution of discrete packet sizes to be
generated. This type of test is common among vendors when perf testing
their devices. link: https://datatracker.ietf.org/doc/html/rfc2544#section-9.1]
This allows users to get a
more complete picture of how their device will perform in the
real-world.
This feature adds a command that allows users to specify an imix
distribution in the following format:
imix_weights size_1,weight_1 size_2,weight_2 ... size_n,weight_n
The distribution of packets with size_i will be
(weight_i / total_weights) where
total_weights = weight_1 + weight_2 + ... + weight_n
For example:
imix_weights 40,7 576,4 1500,1
The pkt_size "40" will account for 7 / (7 + 4 + 1) = ~58% of the total
packets sent.
This patch was tested with the following:
1. imix_weights = 40,7 576,4 1500,1
2. imix_weights = 0,7 576,4 1500,1
- Packet size of 0 is resized to the minimum, 42
3. imix_weights = 40,7 576,4 1500,1 count = 0
- Zero count.
- Runs until user stops pktgen.
Invalid Configurations
1. clone_skb = 200 imix_weights = 40,7 576,4 1500,1
- Returns error code -524 (-ENOTSUPP) when setting imix_weights
2. len(imix_weights) > MAX_IMIX_ENTRIES
- Returns -7 (-E2BIG)
This patch is split into three parts, each provide different aspects of
required functionality:
1. Parse internet mix input.
2. Add IMIX Distribution representation.
3. Process and output IMIX results.
Changes in v2:
* Remove __ prefix outside of uAPI.
* Use seq_puts instead of seq_printf where necessary.
* Reorder variable declaration.
* Return -EINVAL instead of -ENOTSUPP when using IMIX with clone_skb > 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Richardson [Tue, 10 Aug 2021 19:01:55 +0000 (19:01 +0000)]
pktgen: Add output for imix results
The bps for imix mode is calculated by:
sum(imix_entry.size) / time_elapsed
The actual counts of each imix_entry are displayed under the
"Current:" section of the interface output in the following format:
imix_size_counts: size_1,count_1 size_2,count_2 ... size_n,count_n
Nick Richardson [Tue, 10 Aug 2021 19:01:54 +0000 (19:01 +0000)]
pktgen: Add imix distribution bins
In order to represent the distribution of imix packet sizes, a
pre-computed data structure is used. It features 100 (IMIX_PRECISION)
"bins". Contiguous ranges of these bins represent the respective
packet size of each imix entry. This is done to avoid the overhead of
selecting the correct imix packet size based on the corresponding weights.
pkt_size 40 occurs 7/total_weight = 58% of the time
pkt_size 576 occurs 4/total_weight = 33% of the time
pkt_size 1500 occurs 1/total_weight = 9% of the time
We generate a random number between 0-100 and select the corresponding
packet size based on the specified weights.
Eg. random number = 358723895 % 100 = 65
Selects the packet size corresponding to index:65 in the pre-computed
imix_distribution array.
An example of the pre-computed array is below:
The imix_distribution will look like the following:
0 -> 0 (index of imix_entry.size == 40)
1 -> 0 (index of imix_entry.size == 40)
2 -> 0 (index of imix_entry.size == 40)
[...] -> 0 (index of imix_entry.size == 40)
57 -> 0 (index of imix_entry.size == 40)
58 -> 1 (index of imix_entry.size == 576)
[...] -> 1 (index of imix_entry.size == 576)
90 -> 1 (index of imix_entry.size == 576)
91 -> 2 (index of imix_entry.size == 1500)
[...] -> 2 (index of imix_entry.size == 1500)
99 -> 2 (index of imix_entry.size == 1500)
Create and use "bin" representation of the imix distribution.
Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nick Richardson [Tue, 10 Aug 2021 19:01:53 +0000 (19:01 +0000)]
pktgen: Parse internet mix (imix) input
Adds "imix_weights" command for specifying internet mix distribution.
The command is in this format:
"imix_weights size_1,weight_1 size_2,weight_2 ... size_n,weight_n"
where the probability that packet size_i is picked is:
weight_i / (weight_1 + weight_2 + .. + weight_n)
The user may provide up to 100 imix entries (size_i,weight_i) in this
command.
The user specified imix entries will be displayed in the "Params"
section of the interface output.
Values for clone_skb > 0 is not supported in IMIX mode.
Summary of changes:
Add flag for enabling internet mix mode.
Add command (imix_weights) for internet mix input.
Return -ENOTSUPP when clone_skb > 0 in IMIX mode.
Display imix_weights in Params.
Create data structures to store imix entries and distribution.
Signed-off-by: Nick Richardson <richardsonnick@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hoang Le [Wed, 11 Aug 2021 01:22:09 +0000 (08:22 +0700)]
Revert "tipc: Return the correct errno code"
This reverts commit 0efea3c649f0 because of:
- The returning -ENOBUF error is fine on socket buffer allocation.
- There is side effect in the calling path
tipc_node_xmit()->tipc_link_xmit() when checking error code returning.
Fixes: 0efea3c649f0 ("tipc: Return the correct errno code") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Brown [Tue, 10 Aug 2021 12:37:48 +0000 (13:37 +0100)]
net: mscc: Fix non-GPL export of regmap APIs
The ocelot driver makes use of regmap, wrapping it with driver specific
operations that are thin wrappers around the core regmap APIs. These are
exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap
exports which is frowned upon. Add _GPL suffixes to at least the APIs that
are doing register I/O.
Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 12 Aug 2021 05:56:10 +0000 (19:56 -1000)]
Merge tag 'seccomp-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
- Fix typo in user notification documentation (Rodrigo Campos)
- Fix userspace counter report when using TSYNC (Hsuan-Chi Kuo, Wiktor
Garbacz)
* tag 'seccomp-v5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: Fix setting loaded filter count during TSYNC
Documentation: seccomp: Fix typo in user notification
net: bridge: vlan: fix global vlan option range dumping
When global vlan options are equal sequentially we compress them in a
range to save space and reduce processing time. In order to have the
proper range end id we need to update range_end if the options are equal
otherwise we get ranges with the same end vlan id as the start.
Jeremy Kerr [Tue, 10 Aug 2021 02:38:34 +0000 (10:38 +0800)]
mctp: Specify route types, require rtm_type in RTM_*ROUTE messages
This change adds a 'type' attribute to routes, which can be parsed from
a RTM_NEWROUTE message. This will help to distinguish local vs. peer
routes in a future change.
This means userspace will need to set a correct rtm_type in RTM_NEWROUTE
and RTM_DELROUTE messages; we currently only accept RTN_UNICAST.
Yufeng Mo [Tue, 10 Aug 2021 13:28:48 +0000 (21:28 +0800)]
net: hns3: add support for triggering reset by ethtool
Currently, four reset types are supported for the HNS3 ethernet
driver: IMP reset, global reset, function reset, and FLR. Only
FLR can now be triggered by the user. To restore the device when
an exception occurs, add support for triggering reset by ethtool.
Run the "ethtool --reset DEVNAME mgmt | all | dedicated" to
trigger the IMP | global | function reset manually.
Sergey Shtylyov [Tue, 10 Aug 2021 20:17:12 +0000 (23:17 +0300)]
MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
I'm still going to continue looking after the Renesas Ethernet drivers and
device tree bindings. Now my new employer, Open Mobile Platform (OMP), will
pay for all my upstream work. Let's switch to my OMP email for the reviews.
Neal Cardwell [Wed, 11 Aug 2021 02:40:56 +0000 (22:40 -0400)]
tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
Currently if BBR congestion control is initialized after more than 2B
packets have been delivered, depending on the phase of the
tp->delivered counter the tracking of BBR round trips can get stuck.
The bug arises because if tp->delivered is between 2^31 and 2^32 at
the time the BBR congestion control module is initialized, then the
initialization of bbr->next_rtt_delivered to 0 will cause the logic to
believe that the end of the round trip is still billions of packets in
the future. More specifically, the following check will fail
repeatedly:
and thus the connection will take up to 2B packets delivered before
that check will pass and the connection will set:
bbr->round_start = 1;
This could cause many mechanisms in BBR to fail to trigger, for
example bbr_check_full_bw_reached() would likely never exit STARTUP.
This bug is 5 years old and has not been observed, and as a practical
matter this would likely rarely trigger, since it would require
transferring at least 2B packets, or likely more than 3 terabytes of
data, before switching congestion control algorithms to BBR.
This patch is a stable candidate for kernels as far back as v4.9,
when tcp_bbr.c was added.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Kevin Yang <yyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210811024056.235161-1-ncardwell@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jonathan Toppins [Wed, 11 Aug 2021 02:53:30 +0000 (22:53 -0400)]
bonding: remove extraneous definitions from bonding.h
All of the symbols either only exist in bond_options.c or nowhere at
all. These symbols were verified to not exist in the code base by
using `git grep` and their removal was verified by compiling bonding.ko.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Wong Vee Khee [Tue, 10 Aug 2021 08:58:12 +0000 (16:58 +0800)]
net: pcs: xpcs: fix error handling on failed to allocate memory
Drivers such as sja1105 and stmmac that call xpcs_create() expects an
error returned by the pcs-xpcs module, but this was not the case on
failed to allocate memory.
Fixed this by returning an -ENOMEM instead of a NULL pointer.
Mark Brown [Tue, 10 Aug 2021 12:37:48 +0000 (13:37 +0100)]
net: mscc: Fix non-GPL export of regmap APIs
The ocelot driver makes use of regmap, wrapping it with driver specific
operations that are thin wrappers around the core regmap APIs. These are
exported with EXPORT_SYMBOL, dropping the _GPL from the core regmap
exports which is frowned upon. Add _GPL suffixes to at least the APIs that
are doing register I/O.
Willy Tarreau [Mon, 9 Aug 2021 16:06:28 +0000 (18:06 +0200)]
net: linkwatch: fix failure to restore device state across suspend/resume
After migrating my laptop from 4.19-LTS to 5.4-LTS a while ago I noticed
that my Ethernet port to which a bond and a VLAN interface are attached
appeared to remain up after resuming from suspend with the cable unplugged
(and that problem still persists with 5.10-LTS).
It happens that the following happens:
- the network driver (e1000e here) prepares to suspend, calls e1000e_down()
which calls netif_carrier_off() to signal that the link is going down.
- netif_carrier_off() adds a link_watch event to the list of events for
this device
- the device is completely stopped.
- the machine suspends
- the cable is unplugged and the machine brought to another location
- the machine is resumed
- the queued linkwatch events are processed for the device
- the device doesn't yet have the __LINK_STATE_PRESENT bit and its events
are silently dropped
- the device is resumed with its link down
- the upper VLAN and bond interfaces are never notified that the link had
been turned down and remain up
- the only way to provoke a change is to physically connect the machine
to a port and possibly unplug it.
The state after resume looks like this:
$ ip -br li | egrep 'bond|eth'
bond0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP>
eth0 DOWN e8:6a:64:64:64:64 <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP>
eth0.2@eth0 UP e8:6a:64:64:64:64 <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP>
Placing an explicit call to netdev_state_change() either in the suspend
or the resume code in the NIC driver worked around this but the solution
is not satisfying.
The issue in fact really is in link_watch that loses events while it
ought not to. It happens that the test for the device being present was
added by commit 124eee3f6955 ("net: linkwatch: add check for netdevice
being present to linkwatch_do_dev") in 4.20 to avoid an access to
devices that are not present.
Instead of dropping events, this patch proceeds slightly differently by
postponing their handling so that they happen after the device is fully
resumed.
Fixes: 124eee3f6955 ("net: linkwatch: add check for netdevice being present to linkwatch_do_dev") Link: https://lists.openwall.net/netdev/2018/03/15/62 Cc: Heiner Kallweit <hkallweit1@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20210809160628.22623-1-w@1wt.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A recent change in LLVM causes module_{c,d}tor sections to appear when
CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings
because these are not handled anywhere:
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor'
Fangrui explains: "the function asan.module_ctor has the SHF_GNU_RETAIN
flag, so it is in a separate section even with -fno-function-sections
(default)".
Place them in the TEXT_TEXT section so that these technologies continue
to work with the newer compiler versions. All of the KASAN and KCSAN
KUnit tests continue to pass after this change.
Hsuan-Chi Kuo [Thu, 4 Mar 2021 23:37:08 +0000 (17:37 -0600)]
seccomp: Fix setting loaded filter count during TSYNC
The desired behavior is to set the caller's filter count to thread's.
This value is reported via /proc, so this fixes the inaccurate count
exposed to userspace; it is not used for reference counting, etc.
Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.
By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment
In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.
net/mlx5: Reorganize current and maximal capabilities to be per-type
In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.
Shay Drory [Tue, 29 Jun 2021 11:47:30 +0000 (14:47 +0300)]
net/mlx5: Change SF missing dedicated MSI-X err message to dbg
When MSI-X vectors allocated are not enough for SFs to have dedicated,
MSI-X, kernel log buffer has too many entries.
Hence only enable such log with debug level.
Leon Romanovsky [Sun, 1 Aug 2021 08:37:57 +0000 (11:37 +0300)]
net/mlx5: Delete impossible dev->state checks
New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.
That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.
PCI:
.probe()
-> probe_one()
-> mlx5_init_one()
The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().
Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
David S. Miller [Wed, 11 Aug 2021 13:44:59 +0000 (14:44 +0100)]
Merge branch 'dsa-tagger-helpers'
Vladimir Oltean says:
====================
DSA tagger helpers
The goal of this series is to minimize the use of memmove and skb->data
in the DSA tagging protocol drivers. Unfiltered access to this level of
information is not very friendly to drive-by contributors, and sometimes
is also not the easiest to review.
For starters, I have converted the most common form of DSA tagging
protocols: the DSA headers which are placed where the EtherType is.
The helper functions introduced by this series are:
- dsa_alloc_etype_header
- dsa_strip_etype_header
- dsa_etype_header_pos_rx
- dsa_etype_header_pos_tx
This series is just a resend as non-RFC of v1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 10 Aug 2021 13:13:56 +0000 (16:13 +0300)]
net: dsa: create a helper for locating EtherType DSA headers on TX
Create a similar helper for locating the offset to the DSA header
relative to skb->data, and make the existing EtherType header taggers to
use it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 10 Aug 2021 13:13:55 +0000 (16:13 +0300)]
net: dsa: create a helper for locating EtherType DSA headers on RX
It seems that protocol tagging driver writers are always surprised about
the formula they use to reach their EtherType header on RX, which
becomes apparent from the fact that there are comments in multiple
drivers that mention the same information.
Create a helper that returns a void pointer to skb->data - 2, as well as
centralize the explanation why that is the case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 10 Aug 2021 13:13:54 +0000 (16:13 +0300)]
net: dsa: create a helper which allocates space for EtherType DSA headers
Hide away the memmove used by DSA EtherType header taggers to shift the
MAC SA and DA to the left to make room for the header, after they've
called skb_push(). The call to skb_push() is still left explicit in
drivers, to be symmetric with dsa_strip_etype_header, and because not
all callers can be refactored to do it (for example, brcm_tag_xmit_ll
has common code for a pre-Ethernet DSA tag and an EtherType DSA tag).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 10 Aug 2021 13:13:53 +0000 (16:13 +0300)]
net: dsa: create a helper that strips EtherType DSA headers on RX
All header taggers open-code a memmove that is fairly not all that
obvious, and we can hide the details behind a helper function, since the
only thing specific to the driver is the length of the header tag.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>