Baruch Siach [Mon, 21 Feb 2022 11:45:57 +0000 (13:45 +0200)]
net: mdio-ipq4019: add delay after clock enable
Experimentation shows that PHY detect might fail when the code attempts
MDIO bus read immediately after clock enable. Add delay to stabilize the
clock before bus access.
PHY detect failure started to show after commit c85edd943320 ("net:
mdio: Demote probed message to debug print") that removed coincidental
delay between clock enable and bus access.
10ms is meant to match the time it take to send the probed message over
UART at 115200 bps. This might be a far overshoot.
Fixes: 2c6a8126248d ("net: mdio: Add the reset function for IPQ MDIO driver") Signed-off-by: Baruch Siach <baruch.siach@siklu.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Tao Liu [Fri, 18 Feb 2022 14:35:24 +0000 (22:35 +0800)]
gso: do not skip outer ip header in case of ipip and net_failover
We encounter a tcp drop issue in our cloud environment. Packet GROed in
host forwards to a VM virtio_net nic with net_failover enabled. VM acts
as a IPVS LB with ipip encapsulation. The full path like:
host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
-> ipip encap -> net_failover tx -> virtio_net tx
When net_failover transmits a ipip pkt (gso_type = 0x0103, which means
SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso
did because it supports TSO and GSO_IPXIP4. But network_header points to
inner ip header.
Afterwards virtio_net transmits the pkt, only inner ip header is modified.
And the outer one just keeps unchanged. The pkt will be dropped in remote
host.
The root cause of this issue is specific with the rare combination of
SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option.
SKB_GSO_DODGY is set from external virtio_net. We need to reset network
header when callbacks.gso_segment() returns NULL.
This patch also includes ipv6_gso_segment(), considering SIT, etc.
Fixes: ec5a786c2cab ("ipip: add GSO/TSO support") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 20 Feb 2022 13:47:15 +0000 (13:47 +0000)]
Merge branch 'bnxt_en-fixes'
Michael Chan says:
====================
bnxt_en: Bug fixes
This series contains bug fixes for FEC reporting, ethtool self test,
multicast setup, devlink health reporting and live patching, and
a firmware response timeout.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Sun, 20 Feb 2022 09:05:53 +0000 (04:05 -0500)]
bnxt_en: Fix devlink fw_activate
To install a livepatch, first flash the package to NVM, and then
activate the patch through the "HWRM_FW_LIVEPATCH" fw command.
To uninstall a patch from NVM, flash the removal package and then
activate it through the "HWRM_FW_LIVEPATCH" fw command.
The "HWRM_FW_LIVEPATCH" fw command has to consider following scenarios:
1. no patch in NVM and no patch active. Do nothing.
2. patch in NVM, but not active. Activate the patch currently in NVM.
3. patch is not in NVM, but active. Deactivate the patch.
4. patch in NVM and the patch active. Do nothing.
Fix the code to handle these scenarios during devlink "fw_activate".
To install and activate a live patch:
devlink dev flash pci/0000:c1:00.0 file thor_patch.pkg
devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset
To remove and deactivate a live patch:
devlink dev flash pci/0000:c1:00.0 file thor_patch_rem.pkg
devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset
Fixes: 4fe42bdf4f21 ("bnxt_en: implement firmware live patching") Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 20 Feb 2022 09:05:52 +0000 (04:05 -0500)]
bnxt_en: Increase firmware message response DMA wait time
When polling for the firmware message response, we first poll for the
response message header. Once the valid length is detected in the
header, we poll for the valid bit at the end of the message which
signals DMA completion. Normally, this poll time for DMA completion
is extremely short (0 to a few usec). But on some devices under some
rare conditions, it can be up to about 20 msec.
Increase this delay to 50 msec and use udelay() for the first 10 usec
for the common case, and usleep_range() beyond that.
Also, change the error message to include the above delay time when
printing the timeout value.
Fixes: 2e9813a54199 ("bnxt_en: move HWRM API implementation into separate file") Reviewed-by: Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kalesh AP [Sun, 20 Feb 2022 09:05:51 +0000 (04:05 -0500)]
bnxt_en: Restore the resets_reliable flag in bnxt_open()
During ifdown, we call bnxt_inv_fw_health_reg() which will clear
both the status_reliable and resets_reliable flags if these
registers are mapped. This is correct because a FW reset during
ifdown will clear these register mappings. If we detect that FW
has gone through reset during the next ifup, we will remap these
registers.
But during normal ifup with no FW reset, we need to restore the
resets_reliable flag otherwise we will not show the reset counter
during devlink diagnose.
Fixes: 44ebeb20e86c ("bnxt_en: improve fw diagnose devlink health messages") Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavan Chebbi [Sun, 20 Feb 2022 09:05:50 +0000 (04:05 -0500)]
bnxt_en: Fix incorrect multicast rx mask setting when not requested
We should setup multicast only when net_device flags explicitly
has IFF_MULTICAST set. Otherwise we will incorrectly turn it on
even when not asked. Fix it by only passing the multicast table
to the firmware if IFF_MULTICAST is set.
Fixes: 49417e84b42a ("bnxt_en: Setup multicast properly after resetting device.") Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 20 Feb 2022 09:05:49 +0000 (04:05 -0500)]
bnxt_en: Fix occasional ethtool -t loopback test failures
In the current code, we setup the port to PHY or MAC loopback mode
and then transmit a test broadcast packet for the loopback test. This
scheme fails sometime if the port is shared with management firmware
that can also send packets. The driver may receive the management
firmware's packet and the test will fail when the contents don't
match the test packet.
Change the test packet to use it's own MAC address as the destination
and setup the port to only receive it's own MAC address. This should
filter out other packets sent by management firmware.
Fixes: 89a092f12430 ("bnxt_en: Add PHY loopback to ethtool self-test.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sun, 20 Feb 2022 09:05:48 +0000 (04:05 -0500)]
bnxt_en: Fix offline ethtool selftest with RDMA enabled
For offline (destructive) self tests, we need to stop the RDMA driver
first. Otherwise, the RDMA driver will run into unrecoverable errors
when destructive firmware tests are being performed.
The irq_re_init parameter used in the half close and half open
sequence when preparing the NIC for offline tests should be set to
true because the RDMA driver will free all IRQs before the offline
tests begin.
Fixes: ccc1290d5af9 ("bnxt_en: Add external loopback test to ethtool selftest.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Ben Li <ben.li@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Sun, 20 Feb 2022 09:05:47 +0000 (04:05 -0500)]
bnxt_en: Fix active FEC reporting to ethtool
ethtool --show-fec <interface> does not show anything when the Active
FEC setting in the chip is set to None. Fix it to properly return
ETHTOOL_FEC_OFF in that case.
Fixes: 805ebc49b8aa ("bnxt_en: Report FEC settings to ethtool.") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since the blamed commit, dsa_slave_switchdev_event_work() no longer
holds rtnl_lock(), which triggers the ASSERT_RTNL() from
__dev_set_promiscuity().
Taking rtnl_lock() around dev_uc_add() is impossible, because all the
code paths that call dsa_flush_workqueue() do so from contexts where the
rtnl_mutex is already held - so this would lead to an instant deadlock.
dev_uc_add() in itself doesn't require the rtnl_mutex for protection.
There is this comment in __dev_set_rx_mode() which assumes so:
/* Unicast addresses changes may only happen under the rtnl,
* therefore calling __dev_set_promiscuity here is safe.
*/
but it is from commit ef222f7e3f1f ("[NET]: dev: secondary unicast
address support") dated June 2007, and in the meantime, commit e285a628cbde ("netdev: Add addr_list_lock to struct net_device."), dated
July 2008, has added &dev->addr_list_lock to protect this instead of the
global rtnl_mutex.
Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection,
but it is the uncommon path of what we typically expect dev_uc_add()
to do. So since only the uncommon path requires rtnl_lock(), just check
ahead of time whether dev_uc_add() would result into a call to
__dev_set_promiscuity(), and handle that condition separately.
DSA already configures the master interface to be promiscuous if the
tagger requires this. We can extend this to also cover the case where
the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT),
and on the premise that we'd end up making it promiscuous during
operation anyway, either if a DSA slave has a non-inherited MAC address,
or if the bridge notifies local FDB entries for its own MAC address, the
address of a station learned on a foreign port, etc.
Fixes: da7378445a62 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: microchip: fix bridging with more than two member ports
Commit ae0f85a22d68 ("net: dsa: microchip: implement multi-bridge support")
plugged a packet leak between ports that were members of different bridges.
Unfortunately, this broke another use case, namely that of more than two
ports that are members of the same bridge.
After that commit, when a port is added to a bridge, hardware bridging
between other member ports of that bridge will be cleared, preventing
packet exchange between them.
Fix by ensuring that the Port VLAN Membership bitmap includes any existing
ports in the bridge, not just the port being added.
Fixes: ae0f85a22d68 ("net: dsa: microchip: implement multi-bridge support") Signed-off-by: Svenning Sørensen <sss@secomea.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Christophe Leroy [Thu, 17 Feb 2022 13:35:49 +0000 (14:35 +0100)]
net: Force inlining of checksum functions in net/checksum.h
All functions defined as static inline in net/checksum.h are
meant to be inlined for performance reason.
But since commit bfab67995487 ("compiler: enable
CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to
uninline functions when it wants.
Fair enough in the general case, but for tiny performance critical
checksum helpers that's counter-productive.
The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE,
Those helpers being 'static inline' in header files you suddenly find
them duplicated many times in the resulting vmlinux.
Here is a typical exemple when building powerpc pmac32_defconfig
with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times:
David S. Miller [Sat, 19 Feb 2022 12:28:01 +0000 (12:28 +0000)]
Merge branch 'mptcp-fixes'
Mat Martineau says:
====================
mptcp: Fix address advertisement races and stabilize tests
Patches 1, 2, and 7 modify two self tests to give consistent, accurate
results by fixing timing issues and accounting for syncookie behavior.
Paches 3-6 fix two races in overlapping address advertisement send and
receive. Associated self tests are updated, including addition of two
MIBs to enable testing and tracking dropped address events.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:44 +0000 (13:35 -0800)]
selftests: mptcp: be more conservative with cookie MPJ limits
Since commit 360b6444f049 ("mptcp: remote addresses fullmesh"), an
MPTCP client can attempt creating multiple MPJ subflow simultaneusly.
In such scenario the server, when syncookies are enabled, could end-up
accepting incoming MPJ syn even above the configured subflow limit, as
the such limit can be enforced in a reliable way only after the subflow
creation. In case of syncookie, only after the 3rd ack reception.
As a consequence the related self-tests case sporadically fails, as it
verify that the server always accept the expected number of MPJ syn.
Address the issues relaxing the MPJ syn number constrain. Note that the
check on the accepted number of MPJ 3rd ack still remains intact.
Fixes: 360b6444f049 ("mptcp: remote addresses fullmesh") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:43 +0000 (13:35 -0800)]
selftests: mptcp: more robust signal race test
The in kernel MPTCP PM implementation can process a single
incoming add address option at any given time. In the
mentioned test the server can surpass such limit. Let the
setup cope with that allowing a faster add_addr retransmission.
Fixes: 02fcb7437966 ("mptcp: do not block subflows creation on errors") Fixes: f08d1fe793c6 ("mptcp: drop argument port from mptcp_pm_announce_addr") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:42 +0000 (13:35 -0800)]
mptcp: add mibs counter for ignored incoming options
The MPTCP in kernel path manager has some constraints on incoming
addresses announce processing, so that in edge scenarios it can
end-up dropping (ignoring) some of such announces.
The above is not very limiting in practice since such scenarios are
very uncommon and MPTCP will recover due to ADD_ADDR retransmissions.
This patch adds a few MIB counters to account for such drop events
to allow easier introspection of the critical scenarios.
Fixes: f08d1fe793c6 ("mptcp: drop argument port from mptcp_pm_announce_addr") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:41 +0000 (13:35 -0800)]
mptcp: fix race in incoming ADD_ADDR option processing
If an MPTCP endpoint received multiple consecutive incoming
ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite
the current remote address value after the PM lock is released
in mptcp_pm_nl_add_addr_received() and before such address
is echoed.
Fix the issue caching the remote address value a little earlier
and always using the cached value after releasing the PM lock.
Fixes: f08d1fe793c6 ("mptcp: drop argument port from mptcp_pm_announce_addr") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:40 +0000 (13:35 -0800)]
mptcp: fix race in overlapping signal events
After commit 02fcb7437966 ("mptcp: do not block subflows
creation on errors"), if a signal address races with a failing
subflow creation, the subflow creation failure control path
can trigger the selection of the next address to be announced
while the current announced is still pending.
The above will cause the unintended suppression of the ADD_ADDR
announce.
Fix the issue skipping the to-be-suppressed announce before it
will mark an endpoint as already used. The relevant announce
will be triggered again when the current one will complete.
Fixes: 02fcb7437966 ("mptcp: do not block subflows creation on errors") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:39 +0000 (13:35 -0800)]
selftests: mptcp: improve 'fair usage on close' stability
The mentioned test has to wait for a subflow creation failure.
The current code looks for TCP sockets in TW state and sometimes
misses the relevant event. Switch to a more stable check, looking
for the associated mib counter.
Fixes: 4359e3801087 ("selftests: mptcp: add tests for subflow creation failure") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/257 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 18 Feb 2022 21:35:38 +0000 (13:35 -0800)]
selftests: mptcp: fix diag instability
Instead of waiting for an arbitrary amount of time for the MPTCP
MP_CAPABLE handshake to complete, explicitly wait for the relevant
socket to enter into the established status.
Additionally let the data transfer application use the slowest
transfer mode available (-r), to cope with very slow host, or
high jitter caused by hosting VMs.
Fixes: 6aaaa43642d2 ("selftests/mptcp: add diag interface tests") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeremy Linton [Mon, 14 Feb 2022 23:18:52 +0000 (17:18 -0600)]
net: mvpp2: always set port pcs ops
Booting a MACCHIATObin with 5.17, the system OOPs with
a null pointer deref when the network is started. This
is caused by the pcs->ops structure being null in
mcpp2_acpi_start() when it tries to call pcs_config().
Hoisting the code which sets pcs_gmac.ops and pcs_xlg.ops,
assuring they are always set, fixes the problem.
Fixes: fe4aba894db9 ("net: mvpp2: use .mac_select_pcs() interface") Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/r/20220214231852.3331430-1-jeremy.linton@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tom Rix [Mon, 14 Feb 2022 15:40:43 +0000 (07:40 -0800)]
ice: initialize local variable 'tlv'
Clang static analysis reports this issues
ice_common.c:5008:21: warning: The left expression of the compound
assignment is an uninitialized value. The computed value will
also be garbage
ldo->phy_type_low |= ((u64)buf << (i * 16));
~~~~~~~~~~~~~~~~~ ^
When called from ice_cfg_phy_fec() ldo is the uninitialized local
variable tlv. So initialize.
Fixes: 41d02deef93d ("ice: add link lenient and default override support") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tom Rix [Mon, 14 Feb 2022 14:33:27 +0000 (06:33 -0800)]
ice: check the return of ice_ptp_gettimex64
Clang static analysis reports this issue
time64.h:69:50: warning: The left operand of '+'
is a garbage value
set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec,
~~~~~~~~~~ ^
In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now'
is set by ice_ptp_gettimex64(). This function can fail
with -EBUSY, so 'now' can have a gargbage value.
So check the return.
Fixes: 6e97854adf8d ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jacob Keller [Mon, 7 Feb 2022 18:23:29 +0000 (10:23 -0800)]
ice: fix concurrent reset and removal of VFs
Commit fb72aaf6411c ("ice: Stop processing VF messages during teardown")
introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
intended to prevent some issues with concurrently handling messages from
VFs while tearing down the VFs.
This change was motivated by crashes caused while tearing down and
bringing up VFs in rapid succession.
It turns out that the fix actually introduces issues with the VF driver
caused because the PF no longer responds to any messages sent by the VF
during its .remove routine. This results in the VF potentially removing
its DMA memory before the PF has shut down the device queues.
Additionally, the fix doesn't actually resolve concurrency issues within
the ice driver. It is possible for a VF to initiate a reset just prior
to the ice driver removing VFs. This can result in the remove task
concurrently operating while the VF is being reset. This results in
similar memory corruption and panics purportedly fixed by that commit.
Fix this concurrency at its root by protecting both the reset and
removal flows using the existing VF cfg_lock. This ensures that we
cannot remove the VF while any outstanding critical tasks such as a
virtchnl message or a reset are occurring.
This locking change also fixes the root cause originally fixed by commit fb72aaf6411c ("ice: Stop processing VF messages during teardown"), so we
can simply revert it.
Note that I kept these two changes together because simply reverting the
original commit alone would leave the driver vulnerable to worse race
conditions.
Fixes: fb72aaf6411c ("ice: Stop processing VF messages during teardown") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Wojciech Drewek [Fri, 17 Dec 2021 11:36:25 +0000 (12:36 +0100)]
ice: Match on all profiles in slow-path
In switchdev mode, slow-path rules need to match all protocols, in order
to correctly redirect unfiltered or missed packets to the uplink. To set
this up for the virtual function to uplink flow, the rule that redirects
packets to the control VSI must have the tunnel type set to
ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set,
ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all
profiles would be selected for this rule, resulting in the desired
behavior. Without this change slow-path would not work with
tunnel protocols.
Fixes: d8d7095ca94a ("ice: low level support for tunnels") Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Xiaoke Wang [Fri, 18 Feb 2022 02:19:39 +0000 (10:19 +0800)]
net: ll_temac: check the return value of devm_kmalloc()
devm_kmalloc() returns a pointer to allocated memory on success, NULL
on failure. While lp->indirect_lock is allocated by devm_kmalloc()
without proper check. It is better to check the value of it to
prevent potential wrong memory access.
Fixes: abb9098d3898 ("net: ll_temac: Support indirect_mutex share within TEMAC IP") Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Feb 2022 17:05:02 +0000 (09:05 -0800)]
net-timestamp: convert sk->sk_tskey to atomic_t
UDP sendmsg() can be lockless, this is causing all kinds
of data races.
This patch converts sk->sk_tskey to remove one of these races.
BUG: KCSAN: data-race in __ip_append_data / __ip_append_data
read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1:
__ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994
ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0:
__ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994
ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636
udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249
inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x0000054d -> 0x0000054e
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85fa6f-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 4597b1b6fd15 ("net-timestamp: add key to disambiguate concurrent datagrams") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Thu, 17 Feb 2022 13:10:44 +0000 (14:10 +0100)]
sr9700: sanity check for packet length
A malicious device can leak heap data to user space
providing bogus frame lengths. Introduce a sanity check.
Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Thu, 17 Feb 2022 09:30:48 +0000 (11:30 +0200)]
net/sched: act_ct: Fix flow table lookup after ct clear or switching zones
Flow table lookup is skipped if packet either went through ct clear
action (which set the IP_CT_UNTRACKED flag on the packet), or while
switching zones and there is already a connection associated with
the packet. This will result in no SW offload of the connection,
and the and connection not being removed from flow table with
TCP teardown (fin/rst packet).
To fix the above, remove these unneccary checks in flow
table lookup.
Fixes: c936b1fcf91e ("net/sched: act_ct: Software offload of established flows") Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Thu, 17 Feb 2022 01:43:03 +0000 (09:43 +0800)]
drivers: hamradio: 6pack: fix UAF bug caused by mod_timer()
When a 6pack device is detaching, the sixpack_close() will act to cleanup
necessary resources. Although del_timer_sync() in sixpack_close()
won't return if there is an active timer, one could use mod_timer() in
sp_xmit_on_air() to wake up timer again by calling userspace syscall such
as ax25_sendmsg(), ax25_connect() and ax25_ioctl().
This unexpected waked handler, sp_xmit_on_air(), realizes nothing about
the undergoing cleanup and may still call pty_write() to use driver layer
resources that have already been released.
One of the possible race conditions is shown below:
The corresponding fail log is shown below:
===============================================================
BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470
Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0
...
Call Trace:
...
queue_work_on+0x3f/0x50
pty_write+0xcd/0xe0pty_write+0xcd/0xe0
sp_xmit_on_air+0xb2/0x1f0
call_timer_fn+0x28/0x150
__run_timers.part.0+0x3c2/0x470
run_timer_softirq+0x3b/0x80
__do_softirq+0xf1/0x380
...
This patch reorders the del_timer_sync() after the unregister_netdev()
to avoid UAF bugs. Because the unregister_netdev() is well synchronized,
it flushs out any pending queues, waits the refcount of net_device
decreases to zero and removes net_device from kernel. There is not any
running routines after executing unregister_netdev(). Therefore, we could
not arouse timer from userspace again.
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
We've added 8 non-merge commits during the last 7 day(s) which contain
a total of 8 files changed, 119 insertions(+), 15 deletions(-).
The main changes are:
1) Add schedule points in map batch ops, from Eric.
2) Fix bpf_msg_push_data with len 0, from Felix.
3) Fix crash due to incorrect copy_map_value, from Kumar.
4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar.
5) Fix a bpf_timer initialization issue with clang, from Yonghong.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add schedule points in batch ops
bpf: Fix crash due to out of bounds access into reg2btf_ids.
selftests: bpf: Check bpf_msg_push_data return value
bpf: Fix a bpf_timer initialization issue
bpf: Emit bpf_timer in vmlinux BTF
selftests/bpf: Add test for bpf_timer overwriting crash
bpf: Fix crash due to incorrect copy_map_value
bpf: Do not try bpf_msg_push_data with len 0
====================
Linus Torvalds [Thu, 17 Feb 2022 19:33:59 +0000 (11:33 -0800)]
Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless and netfilter.
Current release - regressions:
- dsa: lantiq_gswip: fix use after free in gswip_remove()
- smc: avoid overwriting the copies of clcsock callback functions
Current release - new code bugs:
- iwlwifi:
- fix use-after-free when no FW is present
- mei: fix the pskb_may_pull check in ipv4
- mei: retry mapping the shared area
- mvm: don't feed the hardware RFKILL into iwlmei
Previous releases - regressions:
- ipv6: mcast: use rcu-safe version of ipv6_get_lladdr()
- tipc: fix wrong publisher node address in link publications
- iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW
assertion
- bgmac: make idm and nicpm resource optional again
- atl1c: fix tx timeout after link flap
Previous releases - always broken:
- vsock: remove vsock from connected table when connect is
interrupted by a signal
- ping: change destination interface checks to match raw sockets
- crypto: af_alg - get rid of alg_memory_allocated to avoid confusing
semantics (and null-deref) after SO_RESERVE_MEM was added
- ipv6: make exclusive flowlabel checks per-netns
- bonding: force carrier update when releasing slave
- sched: limit TC_ACT_REPEAT loops
- bridge: multicast: notify switchdev driver whenever MC processing
gets disabled because of max entries reached
- wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found
- iwlwifi: fix locking when "HW not ready"
- phy: mediatek: remove PHY mode check on MT7531
- dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN
- dsa: lan9303:
- fix polarity of reset during probe
- fix accelerated VLAN handling"
* tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits)
bonding: force carrier update when releasing slave
nfp: flower: netdev offload check for ip6gretap
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
ipv4: fix data races in fib_alias_hw_flags_set
net: dsa: lan9303: add VLAN IDs to master device
net: dsa: lan9303: handle hwaccel VLAN tags
vsock: remove vsock from connected table when connect is interrupted by a signal
Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname"
ping: fix the dif and sdif check in ping_lookup
net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990
net: sched: limit TC_ACT_REPEAT loops
tipc: fix wrong notification node addresses
net: dsa: lantiq_gswip: fix use after free in gswip_remove()
ipv6: per-netns exclusive flowlabel checks
net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled
CDC-NCM: avoid overflow in sanity checking
mctp: fix use after free
net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
bonding: fix data-races around agg_select_timer
dpaa2-eth: Initialize mutex used in one step timestamping path
...
Zhang Changzhong [Wed, 16 Feb 2022 14:18:08 +0000 (22:18 +0800)]
bonding: force carrier update when releasing slave
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Commit bd6ea65287c7 ("Fix regression due to "fs: move binfmt_misc sysctl
to its own file") fixed a regression, however it failed to add a
kmemleak_not_leak().
Fixes: bd6ea65287c7 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file") Reported-by: Tong Zhang <ztong0001@gmail.com> Cc: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to
the CPU iterator
- Sync linux/perf_event.h UAPI with the kernel sources
- Update Jiri Olsa's email address in MAINTAINERS
* tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf: Defer freeing string after possible strlen() on it
perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization
libsubcmd: Fix use-after-free for realloc(..., 0)
libperf: Fix perf_cpu_map__for_each_cpu macro
perf cs-etm: Fix corrupt inject files when only last branch option is enabled
perf cs-etm: No-op refactor of synth opt usage
libperf: Fix 32-bit build for tests uint64_t printf
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
perf trace: Avoid early exit due SIGCHLD from non-workload processes
MAINTAINERS: Update Jiri's email address
Danie du Toit [Thu, 17 Feb 2022 12:48:20 +0000 (14:48 +0200)]
nfp: flower: netdev offload check for ip6gretap
IPv6 GRE tunnels are not being offloaded, this is caused by a missing
netdev offload check. The functionality of IPv6 GRE tunnel offloading
was previously added but this check was not included. Adding the
ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly.
Fixes: 57d5a5cb676b ("nfp: flower: Allow ipv6gretap interface for offloading") Signed-off-by: Danie du Toit <danie.dutoit@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 16 Feb 2022 17:32:17 +0000 (09:32 -0800)]
ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt
Because fib6_info_hw_flags_set() is called without any synchronization,
all accesses to gi6->offload, fi->trap and fi->offload_failed
need some basic protection like READ_ONCE()/WRITE_ONCE().
BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt
write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1:
fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230
nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline]
nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline]
nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline]
nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline]
nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30
value changed: 0x22 -> 0x2a
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 323fdfa3a82c ("IPv6: Add "offload failed" indication to routes") Fixes: 4fb4b908ee50 ("ipv6: Add "offload" and "trap" indications to routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amit Cohen <amcohen@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Wed, 16 Feb 2022 17:32:16 +0000 (09:32 -0800)]
ipv4: fix data races in fib_alias_hw_flags_set
fib_alias_hw_flags_set() can be used by concurrent threads,
and is only RCU protected.
We need to annotate accesses to following fields of struct fib_alias:
offload, trap, offload_failed
Because of READ_ONCE()WRITE_ONCE() limitations, make these
field u8.
BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set
read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1:
fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050
nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
process_scheduled_works kernel/workqueue.c:2370 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0:
fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054
nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline]
nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
process_scheduled_works kernel/workqueue.c:2370 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2456
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
value changed: 0x00 -> 0x02
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 371b0fd7623c ("ipv4: Add "offload" and "trap" indications to routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mans Rullgard [Wed, 16 Feb 2022 20:48:18 +0000 (20:48 +0000)]
net: dsa: lan9303: add VLAN IDs to master device
If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received. Do this in the
port_enable() function, and remove them in port_disable().
Fixes: 3bbb5663f4fc ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mans Rullgard [Wed, 16 Feb 2022 12:46:34 +0000 (12:46 +0000)]
net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present. Otherwise,
use __skb_vlan_pop() like the other tag parsers do. This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: 3bbb5663f4fc ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 17 Feb 2022 16:57:47 +0000 (08:57 -0800)]
mm: don't try to NUMA-migrate COW pages that have other uses
Oded Gabbay reports that enabling NUMA balancing causes corruption with
his Gaudi accelerator test load:
"All the details are in the bug, but the bottom line is that somehow,
this patch causes corruption when the numa balancing feature is
enabled AND we don't use process affinity AND we use GUP to pin pages
so our accelerator can DMA to/from system memory.
Either disabling numa balancing, using process affinity to bind to
specific numa-node or reverting this patch causes the bug to
disappear"
and Oded bisected the issue to commit a103ba5cc916 ("mm: do_wp_page()
simplification").
Now, the NUMA balancing shouldn't actually be changing the writability
of a page, and as such shouldn't matter for COW. But it appears it
does. Suspicious.
However, regardless of that, the condition for enabling NUMA faults in
change_pte_range() is nonsensical. It uses "page_mapcount(page)" to
decide if a COW page should be NUMA-protected or not, and that makes
absolutely no sense.
The number of mappings a page has is irrelevant: not only does GUP get a
reference to a page as in Oded's case, but the other mappings migth be
paged out and the only reference to them would be in the page count.
Since we should never try to NUMA-balance a page that we can't move
anyway due to other references, just fix the code to use 'page_count()'.
Oded confirms that that fixes his issue.
Now, this does imply that something in NUMA balancing ends up changing
page protections (other than the obvious one of making the page
inaccessible to get the NUMA faulting information). Otherwise the COW
simplification wouldn't matter - since doing the GUP on the page would
make sure it's writable.
The cause of that permission change would be good to figure out too,
since it clearly results in spurious COW events - but fixing the
nonsensical test that just happened to work before is obviously the
CorrectThing(tm) to do regardless.
Seth Forshee [Thu, 17 Feb 2022 14:13:12 +0000 (08:13 -0600)]
vsock: remove vsock from connected table when connect is interrupted by a signal
vsock_connect() expects that the socket could already be in the
TCP_ESTABLISHED state when the connecting task wakes up with a signal
pending. If this happens the socket will be in the connected table, and
it is not removed when the socket state is reset. In this situation it's
common for the process to retry connect(), and if the connection is
successful the socket will be added to the connected table a second
time, corrupting the list.
Prevent this by calling vsock_remove_connected() if a signal is received
while waiting for a connection. This is harmless if the socket is not in
the connected table, and if it is in the table then removing it will
prevent list corruption from a double add.
Note for backporting: this patch requires dee01a9904fd ("vsock: correct
removal of socket from the list"), which is in all current stable trees
except 4.9.y.
Since idm_base and nicpm_base are still optional resources not present
on all platforms, this breaks the driver for everything except Northstar
2 (which has both).
The same change was already reverted once with bb93c3c4fae3 ("net:
broadcom: fix a mistake about ioremap resource").
So let's do it again.
Fixes: 8b375ede3c51 ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
[florian: Added comments to explain the resources are optional] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Xin Long [Wed, 16 Feb 2022 05:20:52 +0000 (00:20 -0500)]
ping: fix the dif and sdif check in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:
# ip link add dummy0 type dummy
# ip link set dummy0 up
# ip addr add 192.168.111.1/24 dev dummy0
# ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.
Fixes: cfce3383c7be ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
perf bpf: Defer freeing string after possible strlen() on it
This was detected by the gcc in Fedora Rawhide's gcc:
50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC)
inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9:
util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free]
1225 | *key_scan_pos += strlen(map_opt);
| ^~~~~~~~~~~~~~~
util/bpf-loader.c:1223:9: note: call to 'free' here
1223 | free(map_name);
| ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors
So do the calculations on the pointer before freeing it.
Fixes: 68eaf1530bf9810b ("perf bpf-loader: Add missing '*' for key_scan_pos") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jon Maloy [Wed, 16 Feb 2022 02:00:09 +0000 (21:00 -0500)]
tipc: fix wrong notification node addresses
The previous bug fix had an unfortunate side effect that broke
distribution of binding table entries between nodes. The updated
tipc_sock_addr struct is also used further down in the same
function, and there the old value is still the correct one.
Willem de Bruijn [Tue, 15 Feb 2022 16:00:37 +0000 (11:00 -0500)]
ipv6: per-netns exclusive flowlabel checks
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).
Commit 0e86d7cdae9f ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.
That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.
Changes
v2
- wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).
This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.
bpf: Fix crash due to out of bounds access into reg2btf_ids.
When commit 51b3c62c364b ("bpf: Support bpf program calling kernel function") added
kfunc support, it defined reg2btf_ids as a cheap way to translate the verifier
reg type to the appropriate btf_vmlinux BTF ID, however
commit dd105e1e7aeb ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL")
moved the __BPF_REG_TYPE_MAX from the last member of bpf_reg_type enum to after
the base register types, and defined other variants using type flag
composition. However, now, the direct usage of reg->type to index into
reg2btf_ids may no longer fall into __BPF_REG_TYPE_MAX range, and hence lead to
out of bounds access and kernel crash on dereference of bad pointer.
Linus Torvalds [Tue, 15 Feb 2022 23:28:00 +0000 (15:28 -0800)]
tty: n_tty: do not look ahead for EOL character past the end of the buffer
Daniel Gibson reports that the n_tty code gets line termination wrong in
very specific cases:
"If you feed a line with exactly 64 chars + terminating newline, and
directly afterwards (without reading) another line into a pseudo
terminal, the the first read() on the other side will return the 64
char line *without* terminating newline, and the next read() will
return the missing terminating newline AND the complete next line (if
it fits in the buffer)"
and bisected the behavior to commit 3df8f9a82868 ("tty: convert
tty_ldisc_ops 'read()' function to take a kernel pointer").
Now, digging deeper, it turns out that the behavior isn't exactly new:
what changed in commit 3df8f9a82868 was that the tty line discipline
.read() function is now passed an intermediate kernel buffer rather than
the final user space buffer.
And that intermediate kernel buffer is 64 bytes in size - thus that
special case with exactly 64 bytes plus terminating newline.
The same problem did exist before, but historically the boundary was not
the 64-byte chunk, but the user-supplied buffer size, which is obviously
generally bigger (and potentially bigger than N_TTY_BUF_SIZE, which
would hide the issue entirely).
The reason is that the n_tty canon_copy_from_read_buf() code would look
ahead for the EOL character one byte further than it would actually
copy. It would then decide that it had found the terminator, and unmark
it as an EOL character - which in turn explains why the next read
wouldn't then be terminated by it.
Now, the reason it did all this in the first place is related to some
historical and pretty obscure EOF behavior, see commit 2943ef2ace49
("n_tty: Fix poll() after buffer-limited eof push read") and commit c6209b192a52 ("n_tty: Fix EOF push handling").
And the reason for the EOL confusion is that we treat EOF as a special
EOL condition, with the EOL character being NUL (aka "__DISABLED_CHAR"
in the kernel sources).
So that EOF look-ahead also affects the normal EOL handling.
This patch just removes the look-ahead that causes problems, because EOL
is much more critical than the historical "EOF in the middle of a line
that coincides with the end of the buffer" handling ever was.
Now, it is possible that we should indeed re-introduce the "look at next
character to see if it's a EOF" behavior, but if so, that should be done
not at the kernel buffer chunk boundary in canon_copy_from_read_buf(),
but at a higher level, when we run out of the user buffer.
In particular, the place to do that would be at the top of
'n_tty_read()', where we check if it's a continuation of a previously
started read, and there is no more buffer space left, we could decide to
just eat the __DISABLED_CHAR at that point.
But that would be a separate patch, because I suspect nobody actually
cares, and I'd like to get a report about it before bothering.
Fixes: 3df8f9a82868 ("tty: convert tty_ldisc_ops 'read()' function to take a kernel pointer") Fixes: 2943ef2ace49 ("n_tty: Fix poll() after buffer-limited eof push read") Fixes: c6209b192a52 ("n_tty: Fix EOF push handling") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215611 Reported-and-tested-by: Daniel Gibson <metalcaedes@gmail.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The struct perf_event_attr is initialised differently in Arm64 when
recording in call-graph fp mode, so update the relevant tests, and add
two extra arm64-only tests.
Before:
$ perf test 17 -v
17: Setup struct perf_event_attr
[...]
running './tests/attr/test-record-graph-default'
expected sample_type=295, got 4391
expected sample_regs_user=0, got 1073741824
FAILED './tests/attr/test-record-graph-default' - match failure
test child finished with -1
---- end ----
After:
[...]
running './tests/attr/test-record-graph-default-aarch64'
test limitation 'aarch64'
running './tests/attr/test-record-graph-fp-aarch64'
test limitation 'aarch64'
running './tests/attr/test-record-graph-default'
test limitation '!aarch64'
excluded architecture list ['aarch64']
skipped [aarch64] './tests/attr/test-record-graph-default'
running './tests/attr/test-record-graph-fp'
test limitation '!aarch64'
excluded architecture list ['aarch64']
skipped [aarch64] './tests/attr/test-record-graph-fp'
[...]
Fixes: 2f17bf9716adec46 ("perf tools: Record ARM64 LR register automatically") Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20220125104435.2737-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kees Cook [Sun, 13 Feb 2022 18:24:43 +0000 (10:24 -0800)]
libsubcmd: Fix use-after-free for realloc(..., 0)
GCC 12 correctly reports a potential use-after-free condition in the
xrealloc helper. Fix the warning by avoiding an implicit "free(ptr)"
when size == 0:
In file included from help.c:12:
In function 'xrealloc',
inlined from 'add_cmdname' at help.c:24:2: subcmd-util.h:56:23: error: pointer may be used after 'realloc' [-Werror=use-after-free]
56 | ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
52 | void *ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
subcmd-util.h:58:31: error: pointer may be used after 'realloc' [-Werror=use-after-free]
58 | ret = realloc(ptr, 1);
| ^~~~~~~~~~~~~~~
subcmd-util.h:52:21: note: call to 'realloc' here
52 | void *ret = realloc(ptr, size);
| ^~~~~~~~~~~~~~~~~~
Jiri Olsa [Tue, 15 Feb 2022 15:37:13 +0000 (16:37 +0100)]
libperf: Fix perf_cpu_map__for_each_cpu macro
Tzvetomir Stoyanov reported an issue with using macro
perf_cpu_map__for_each_cpu using private perf_cpu object.
The issue is caused by recent change that wrapped cpu in struct perf_cpu
to distinguish it from cpu indexes. We need to make struct perf_cpu
public.
Add a simple test for using the perf_cpu_map__for_each_cpu macro.
Fixes: 446c8a88b5bb1910 ("perf cpumap: Give CPUs their own type") Reported-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220215153713.31395-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Thu, 10 Feb 2022 20:06:20 +0000 (20:06 +0000)]
perf cs-etm: Fix corrupt inject files when only last branch option is enabled
'perf inject' with Coresight data generates files that cannot be opened
when only the last branch option is specified:
perf inject -i perf.data --itrace=l -o inject.data
perf script -i inject.data
0x33faa8 [0x8]: failed to process type: 9 [Bad address]
This is because cs_etm__synth_instruction_sample() is called even when
the sample type for instructions hasn't been setup. Last branch records
are attached to instruction samples so it doesn't make sense to generate
them when --itrace=i isn't specified anyway.
This change disables all calls of cs_etm__synth_instruction_sample()
unless --itrace=i is specified, resulting in a file with no samples if
only --itrace=l is provided, rather than a bad file.
Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220210200620.1227232-2-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark [Thu, 10 Feb 2022 20:06:19 +0000 (20:06 +0000)]
perf cs-etm: No-op refactor of synth opt usage
sample_branches and sample_instructions are already saved in the
synth_opts struct. Other usages like synth_opts.last_branch don't save a
value, so make this more consistent by always going through synth_opts
and not saving duplicate values.
Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20220210200620.1227232-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rob Herring [Tue, 1 Feb 2022 21:39:03 +0000 (15:39 -0600)]
libperf: Fix 32-bit build for tests uint64_t printf
Commit 8a3f5512c36cb74a ("libperf tests: Add test_stat_multiplexing test")
added printf's of 64-bit ints using %lu which doesn't work on 32-bit
builds:
tests/test-evlist.c:529:29: error: format ‘%lu’ expects argument of type \
‘long unsigned int’, but argument 4 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=]
Use PRIu64 instead which works on both 32-bit and 64-bit systems.
Fixes: 8a3f5512c36cb74a ("libperf tests: Add test_stat_multiplexing test") Signed-off-by: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Link: https://lore.kernel.org/r/20220201213903.699656-1-robh@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools headers UAPI: Sync linux/perf_event.h with the kernel sources
To pick the trivial change in:
990d1c7705791614 ("perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures")
Just adds a comment.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/perf_event.h' differs from latest version at 'include/uapi/linux/perf_event.h'
diff -u tools/include/uapi/linux/perf_event.h include/uapi/linux/perf_event.h
Cc: Marco Elver <elver@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Changbin Du <changbin.du@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220208140725.3947-1-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Tue, 15 Feb 2022 19:07:59 +0000 (11:07 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Read HW interrupt pending state from the HW
x86:
- Don't truncate the performance event mask on AMD
- Fix Xen runstate updates to be atomic when preempting vCPU
- Fix for AMD AVIC interrupt injection race
- Several other AMD fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW
KVM: x86/pmu: Don't truncate the PerfEvtSeln MSR when creating a perf event
KVM: SVM: fix race between interrupt delivery and AVIC inhibition
KVM: SVM: set IRR in svm_deliver_interrupt
KVM: SVM: extract avic_ring_doorbell
selftests: kvm: Remove absent target file
KVM: arm64: vgic: Read HW interrupt pending state from the HW
KVM: x86/xen: Fix runstate updates to be atomic when preempting vCPU
KVM: x86: SVM: move avic definitions from AMD's spec to svm.h
KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it
KVM: x86: nSVM: deal with L1 hypervisor that intercepts interrupts but lets L2 control them
KVM: x86: nSVM: expose clean bit support to the guest
KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM
KVM: x86: nSVM: mark vmcb01 as dirty when restoring SMM saved state
KVM: x86: nSVM: fix potential NULL derefernce on nested migration
KVM: x86: SVM: don't passthrough SMAP/SMEP/PKE bits in !NPT && !gCR0.PG case
Revert "svm: Add warning message for AVIC IPI invalid target"
Linus Torvalds [Tue, 15 Feb 2022 18:52:05 +0000 (10:52 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- memory leak fix for hid-elo driver (Dongliang Mu)
- fix for hangs on newer AMD platforms with amd_sfh-driven hardware
(Basavaraj Natikar )
- locking fix in i2c-hid (Daniel Thompson)
- a few device-ID specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: amd_sfh: Add interrupt handler to process interrupts
HID: amd_sfh: Add functionality to clear interrupts
HID: amd_sfh: Disable the interrupt for all command
HID: amd_sfh: Correct the structure field name
HID: amd_sfh: Handle amd_sfh work buffer in PM ops
HID:Add support for UGTABLET WP5540
HID: amd_sfh: Add illuminance mask to limit ALS max value
HID: amd_sfh: Increase sensor command timeout
HID: i2c-hid: goodix: Fix a lockdep splat
HID: elo: fix memory leak in elo_probe
HID: apple: Set the tilde quirk flag on the Wellspring 5 and later
Felix Maurer [Fri, 11 Feb 2022 17:43:36 +0000 (18:43 +0100)]
selftests: bpf: Check bpf_msg_push_data return value
bpf_msg_push_data may return a non-zero value to indicate an error. The
return value should be checked to prevent undetected errors.
To indicate an error, the BPF programs now perform a different action
than their intended one to make the userspace test program notice the
error, i.e., the programs supposed to pass/redirect drop, the program
supposed to drop passes.
Linus Torvalds [Tue, 15 Feb 2022 17:14:05 +0000 (09:14 -0800)]
Merge tag 'for-5.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- yield CPU more often when defragmenting a large file
- skip defragmenting extents already under writeback
- improve error message when send fails to write file data
- get rid of warning when mounted with 'flushoncommit'
* tag 'for-5.17-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: send: in case of IO error log it
btrfs: get rid of warning on transaction commit when using flushoncommit
btrfs: defrag: don't try to defrag extents which are under writeback
btrfs: don't hold CPU for too long when defragging a file
Linus Torvalds [Tue, 15 Feb 2022 17:10:09 +0000 (09:10 -0800)]
Merge tag 'for-5.17/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
- Fix miscompilations when function calls are made from inside a
put_user() call
- Drop __init from map_pages() declaration to avoid random boot crashes
- Added #error messages if a 64-bit compiler was used to build a 32-bit
kernel (and vice versa)
- Fix out-of-bound data TLB miss faults in sba_iommu and ccio-dma
drivers
- Add ioread64_lo_hi() and iowrite64_lo_hi() functions to avoid kernel
test robot errors
- Fix link failure when 8250_gsc driver is built without CONFIG_IOSAPIC
* tag 'for-5.17/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
serial: parisc: GSC: fix build when IOSAPIC is not set
parisc: Fix some apparent put_user() failures
parisc: Show error if wrong 32/64-bit compiler is being used
parisc: Add ioread64_lo_hi() and iowrite64_lo_hi()
parisc: Fix sglist access in ccio-dma.c
parisc: Fix data TLB miss in sba_unmap_sg
parisc: Drop __init from map_pages declaration
Linus Torvalds [Tue, 15 Feb 2022 17:05:01 +0000 (09:05 -0800)]
Merge tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Rework use of DMA_BIT_MASK in vmbus to work around a clang bug
(Michael Kelley)
- Fix NUMA topology (Long Li)
- Fix a memory leak in vmbus (Miaoqian Lin)
- One minor clean-up patch (Cai Huoqing)
* tag 'hyperv-fixes-signed-20220215' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: utils: Make use of the helper macro LIST_HEAD()
Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64)
Drivers: hv: vmbus: Fix memory leak in vmbus_add_channel_kobj
PCI: hv: Fix NUMA node assignment when kernel boots with custom NUMA topology
Oliver Neukum [Tue, 15 Feb 2022 10:35:47 +0000 (11:35 +0100)]
CDC-NCM: avoid overflow in sanity checking
A broken device may give an extreme offset like 0xFFF0
and a reasonable length for a fragment. In the sanity
check as formulated now, this will create an integer
overflow, defeating the sanity check. Both offset
and offset + len need to be checked in such a manner
that no overflow can occur.
And those quantities should be unsigned.
Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Rix [Tue, 15 Feb 2022 02:05:41 +0000 (18:05 -0800)]
mctp: fix use after free
Clang static analysis reports this problem
route.c:425:4: warning: Use of memory after it is freed
trace_mctp_key_acquire(key);
^~~~~~~~~~~~~~~~~~~~~~~~~~~
When mctp_key_add() fails, key is freed but then is later
used in trace_mctp_key_acquire(). Add an else statement
to use the key only when mctp_key_add() is successful.
Fixes: 61aac849e8d4 ("mctp: Add tracepoints for tag/key handling") Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Mon, 14 Feb 2022 23:42:00 +0000 (01:42 +0200)]
net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()
ocelot_vlan_member_del() will free the struct ocelot_bridge_vlan, so if
this is the same as the port's pvid_vlan which we access afterwards,
what we're accessing is freed memory.
Fix the bug by determining whether to clear ocelot_port->pvid_vlan prior
to calling ocelot_vlan_member_del().
Fixes: 15098771f48d ("net: mscc: ocelot: track the port pvid using a pointer") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 14 Feb 2022 19:15:53 +0000 (11:15 -0800)]
bonding: fix data-races around agg_select_timer
syzbot reported that two threads might write over agg_select_timer
at the same time. Make agg_select_timer atomic to fix the races.
BUG: KCSAN: data-race in bond_3ad_initiate_agg_selection / bond_3ad_state_machine_handler
read to 0xffff8881242aea90 of 4 bytes by task 1846 on cpu 1:
bond_3ad_state_machine_handler+0x99/0x2810 drivers/net/bonding/bond_3ad.c:2317
process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
worker_thread+0x616/0xa70 kernel/workqueue.c:2454
kthread+0x1bf/0x1e0 kernel/kthread.c:377
ret_from_fork+0x1f/0x30
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 25910 Comm: syz-executor.1 Tainted: G W 5.17.0-rc4-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Radu Bulie [Mon, 14 Feb 2022 17:45:34 +0000 (19:45 +0200)]
dpaa2-eth: Initialize mutex used in one step timestamping path
1588 Single Step Timestamping code path uses a mutex to
enforce atomicity for two events:
- update of ptp single step register
- transmit ptp event packet
Before this patch the mutex was not initialized. This
caused unexpected crashes in the Tx function.
Fixes: f839b79b5395a ("dpaa2-eth: support PTP Sync packet one-step timestamping") Signed-off-by: Radu Bulie <radu-andrei.bulie@nxp.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Rix [Mon, 14 Feb 2022 15:41:39 +0000 (07:41 -0800)]
dpaa2-switch: fix default return of dpaa2_switch_flower_parse_mirror_key
Clang static analysis reports this representative problem
dpaa2-switch-flower.c:616:24: warning: The right operand of '=='
is a garbage value
tmp->cfg.vlan_id == vlan) {
^ ~~~~
vlan is set in dpaa2_switch_flower_parse_mirror_key(). However
this function can return success without setting vlan. So
change the default return to -EOPNOTSUPP.
Fixes: 33d603c08665 ("dpaa2-switch: add VLAN based mirroring") Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Yunkai [Mon, 14 Feb 2022 03:27:21 +0000 (03:27 +0000)]
ipv4: add description about martian source
When multiple containers are running in the environment and multiple
macvlan network port are configured in each container, a lot of martian
source prints will appear after martian_log is enabled. they are almost
the same, and printed by net_warn_ratelimited. Each arp message will
trigger this print on each network port.
Such as:
IPv4: martian source 173.254.95.16 from 173.254.100.109,
on dev eth0
ll header: 00000000: ff ff ff ff ff ff 40 00 ad fe 64 6d
08 06 ......@...dm..
IPv4: martian source 173.254.95.16 from 173.254.100.109,
on dev eth1
ll header: 00000000: ff ff ff ff ff ff 40 00 ad fe 64 6d
08 06 ......@...dm..
There is no description of this kind of source in the RFC1812.
Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 196619f605c5 ("net: add new socket option SO_RESERVE_MEM") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 15 Feb 2022 14:22:05 +0000 (14:22 +0000)]
Merge tag 'ieee802154-for-net-2022-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
Only a single fix this time.
Miquel Raynal fixed the lifs/sifs periods in the ca82010 to take the actual
symbol duration time into account.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
DENG Qingfang [Wed, 9 Feb 2022 14:39:47 +0000 (22:39 +0800)]
net: phy: mediatek: remove PHY mode check on MT7531
The function mt7531_phy_mode_supported in the DSA driver set supported
mode to PHY_INTERFACE_MODE_GMII instead of PHY_INTERFACE_MODE_INTERNAL
for the internal PHY, so this check breaks the PHY initialization:
mt7530 mdio-bus:00 wan (uninitialized): failed to connect to PHY: -EINVAL
Remove the check to make it work again.
Reported-by: Hauke Mehrtens <hauke@hauke-m.de> Fixes: a9ecc9f85801 ("net: phy: add MediaTek Gigabit Ethernet PHY driver") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Acked-by: Arınç ÜNAL <arinc.unal@arinc9.com> Tested-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 14 Feb 2022 01:38:52 +0000 (20:38 -0500)]
tipc: fix wrong publisher node address in link publications
When a link comes up we add its presence to the name table to make it
possible for users to subscribe for link up/down events. However, after
a previous call signature change the binding is wrongly published with
the peer node as publishing node, instead of the own node as it should
be. This has the effect that the command 'tipc name table show' will
list the link binding (service type 2) with node scope and a peer node
as originator, something that obviously is impossible.
Jiri Olsa [Tue, 8 Feb 2022 22:11:17 +0000 (23:11 +0100)]
MAINTAINERS: Update Jiri's email address
Using my kernel.org email.
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220208221117.710405-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Randy Dunlap [Mon, 14 Feb 2022 18:00:19 +0000 (10:00 -0800)]
serial: parisc: GSC: fix build when IOSAPIC is not set
There is a build error when using a kernel .config file from
'kernel test robot' for a different build problem:
hppa64-linux-ld: drivers/tty/serial/8250/8250_gsc.o: in function `.LC3':
(.data.rel.ro+0x18): undefined reference to `iosapic_serial_irq'
when:
CONFIG_GSC=y
CONFIG_SERIO_GSCPS2=y
CONFIG_SERIAL_8250_GSC=y
CONFIG_PCI is not set
and hence PCI_LBA is not set.
IOSAPIC depends on PCI_LBA, so IOSAPIC is not set/enabled.
Make the use of iosapic_serial_irq() conditional to fix the build error.
Linus Torvalds [Mon, 14 Feb 2022 17:51:26 +0000 (09:51 -0800)]
Merge tag 'regulator-fix-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One fix here, for initialisation of regulators that don't have an
in_enabled() operation which would mainly impact cases where they
aren't otherwise used during early setup for some reason"
* tag 'regulator-fix-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: core: fix false positive in regulator_late_cleanup()
HID: amd_sfh: Add interrupt handler to process interrupts
On newer AMD platforms with SFH, it is observed that random interrupts
get generated on the SFH hardware and until this is cleared the firmware
sensor processing is stalled, resulting in no data been received to
driver side.
Add routines to handle these interrupts, so that firmware operations are
not stalled.
HID: amd_sfh: Add functionality to clear interrupts
Newer AMD platforms with SFH may generate interrupts on some events
which are unwarranted. Until this is cleared the actual MP2 data
processing maybe stalled in some cases.
Add a mechanism to clear the pending interrupts (if any) during the
driver initialization and sensor command operations.
HID: amd_sfh: Handle amd_sfh work buffer in PM ops
Since in the current amd_sfh design the sensor data is periodically
obtained in the form of poll data, during the suspend/resume cycle,
scheduling a delayed work adds no value.
So, cancel the work and restart back during the suspend/resume cycle
respectively.
Oliver Neukum [Mon, 14 Feb 2022 14:08:18 +0000 (15:08 +0100)]
USB: zaurus: support another broken Zaurus
This SL-6000 says Direct Line, not Ethernet
v2: added Reporter and Link
Signed-off-by: Oliver Neukum <oneukum@suse.com> Reported-by: Ross Maynard <bids.7405@bigpond.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215361 Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Maydanik [Sat, 12 Feb 2022 10:29:27 +0000 (12:29 +0200)]
net: fix documentation for kernel_getsockname
Fixes return value documentation of kernel_getsockname()
and kernel_getpeername() functions.
The previous documentation wrongly specified that the return
value is 0 in case of success, however sock->ops->getname returns
the length of the address in bytes in case of success.
Signed-off-by: Alex Maydanik <alexander.maydanik@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: ce31671505e4 ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>