Eric Dumazet [Wed, 12 Oct 2022 13:34:12 +0000 (13:34 +0000)]
kcm: avoid potential race in kcm_tx_work
syzbot found that kcm_tx_work() could crash [1] in:
/* Primarily for SOCK_SEQPACKET sockets */
if (likely(sk->sk_socket) &&
test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
<<*>> clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_space(sk);
}
I think the reason is that another thread might concurrently
run in kcm_release() and call sock_orphan(sk) while sk is not
locked. kcm_tx_work() find sk->sk_socket being NULL.
[1]
BUG: KASAN: null-ptr-deref in instrument_atomic_write include/linux/instrumented.h:86 [inline]
BUG: KASAN: null-ptr-deref in clear_bit include/asm-generic/bitops/instrumented-atomic.h:41 [inline]
BUG: KASAN: null-ptr-deref in kcm_tx_work+0xff/0x160 net/kcm/kcmsock.c:742
Write of size 8 at addr 0000000000000008 by task kworker/u4:3/53
tcp: Clean up kernel listener's reqsk in inet_twsk_purge()
Eric Dumazet reported a use-after-free related to the per-netns ehash
series. [0]
When we create a TCP socket from userspace, the socket always holds a
refcnt of the netns. This guarantees that a reqsk timer is always fired
before netns dismantle. Each reqsk has a refcnt of its listener, so the
listener is not freed before the reqsk, and the net is not freed before
the listener as well.
OTOH, when in-kernel users create a TCP socket, it might not hold a refcnt
of its netns. Thus, a reqsk timer can be fired after the netns dismantle
and access freed per-netns ehash.
To avoid the use-after-free, we need to clean up TCP_NEW_SYN_RECV sockets
in inet_twsk_purge() if the netns uses a per-netns ehash.
BUG: KASAN: use-after-free in tcp_or_dccp_get_hashinfo
include/net/inet_hashtables.h:181 [inline]
BUG: KASAN: use-after-free in reqsk_queue_unlink+0x320/0x350
net/ipv4/inet_connection_sock.c:913
Read of size 8 at addr ffff88807545bd80 by task syz-executor.2/8301
The issue was introduced by commit 248d45f1e193 ("openvswitch: Allow
attaching helper in later commit") where it somehow removed the check
of nf_ct_is_confirmed before asigning the helper. This patch is to fix
it by bringing it back.
Fixes: 248d45f1e193 ("openvswitch: Allow attaching helper in later commit") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Aaron Conole <aconole@redhat.com> Tested-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/c5c9092a22a2194650222bffaf786902613deb16.1665085502.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
tcp/udp: Fix memory leaks and data races around IPV6_ADDRFORM.
This series fixes some memory leaks and data races caused in the
same scenario where one thread converts an IPv6 socket into IPv4
with IPV6_ADDRFORM and another accesses the socket concurrently.
setsockopt(IPV6_ADDRFORM) and tcp_v6_connect() change icsk->icsk_af_ops
under lock_sock(), but tcp_(get|set)sockopt() read it locklessly. To
avoid load/store tearing, we need to add READ_ONCE() and WRITE_ONCE()
for the reads and writes.
Thanks to Eric Dumazet for providing the syzbot report:
BUG: KCSAN: data-race in tcp_setsockopt / tcp_v6_connect
write to 0xffff88813c624518 of 8 bytes by task 23936 on cpu 0:
tcp_v6_connect+0x5b3/0xce0 net/ipv6/tcp_ipv6.c:240
__inet_stream_connect+0x159/0x6d0 net/ipv4/af_inet.c:660
inet_stream_connect+0x44/0x70 net/ipv4/af_inet.c:724
__sys_connect_file net/socket.c:1976 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1993
__do_sys_connect net/socket.c:2003 [inline]
__se_sys_connect net/socket.c:2000 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:2000
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
read to 0xffff88813c624518 of 8 bytes by task 23937 on cpu 1:
tcp_setsockopt+0x147/0x1c80 net/ipv4/tcp.c:3789
sock_common_setsockopt+0x5d/0x70 net/core/sock.c:3585
__sys_setsockopt+0x212/0x2b0 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0x62/0x70 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
value changed: 0xffffffff8539af68 -> 0xffffffff8539aff8
Commit 086d49058cd8 ("ipv6: annotate some data-races around sk->sk_prot")
fixed some data-races around sk->sk_prot but it was not enough.
Some functions in inet6_(stream|dgram)_ops still access sk->sk_prot
without lock_sock() or rtnl_lock(), so they need READ_ONCE() to avoid
load tearing.
tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
IPv4 conversion by IPV6_ADDRFORM. However, commit 03485f2adcde ("udpv6:
Add lockless sendmsg() support") added a lockless memory allocation path,
which could cause a memory leak:
setsockopt(IPV6_ADDRFORM) sendmsg()
+-----------------------+ +-------+
- do_ipv6_setsockopt(sk, ...) - udpv6_sendmsg(sk, ...)
- sockopt_lock_sock(sk) ^._ called via udpv6_prot
- lock_sock(sk) before WRITE_ONCE()
- WRITE_ONCE(sk->sk_prot, &tcp_prot)
- inet6_destroy_sock() - if (!corkreq)
- sockopt_release_sock(sk) - ip6_make_skb(sk, ...)
- release_sock(sk) ^._ lockless fast path for
the non-corking case
- __ip6_append_data(sk, ...)
- ipv6_local_rxpmtu(sk, ...)
- xchg(&np->rxpmtu, skb)
^._ rxpmtu is never freed.
- goto out_no_dst;
- lock_sock(sk)
For now, rxpmtu is only the case, but not to miss the future change
and a similar bug fixed in commit e27326009a3d ("net: ping6: Fix
memleak in ipv6_renew_options()."), let's set a new function to IPv6
sk->sk_destruct() and call inet6_cleanup_sock() there. Since the
conversion does not change sk->sk_destruct(), we can guarantee that
we can clean up IPv6 resources finally.
We can now remove all inet6_destroy_sock() calls from IPv6 protocol
specific ->destroy() functions, but such changes are invasive to
backport. So they can be posted as a follow-up later for net-next.
udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
Commit 4b340ae20d0e ("IPv6: Complete IPV6_DONTFRAG support") forgot
to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
socket into IPv4 with IPV6_ADDRFORM. After conversion, sk_prot is
changed to udp_prot and ->destroy() never cleans it up, resulting in
a memory leak.
This is due to the discrepancy between inet6_destroy_sock() and
IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
to remove the difference.
However, this is not enough for now because rxpmtu can be changed
without lock_sock() after commit 03485f2adcde ("udpv6: Add lockless
sendmsg() support"). We will fix this case in the following patch.
Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
in the future.
syzbot reported a memory leak [0] related to IPV6_ADDRFORM.
The scenario is that while one thread is converting an IPv6 socket into
IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
allocates memory to inet6_sk(sk)->XXX after conversion.
Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
which inet6_destroy_sock() should have cleaned up.
setsockopt(IPV6_ADDRFORM) setsockopt(IPV6_DSTOPTS)
+-----------------------+ +----------------------+
- do_ipv6_setsockopt(sk, ...)
- sockopt_lock_sock(sk) - do_ipv6_setsockopt(sk, ...)
- lock_sock(sk) ^._ called via tcpv6_prot
- WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE()
- xchg(&np->opt, NULL)
- txopt_put(opt)
- sockopt_release_sock(sk)
- release_sock(sk) - sockopt_lock_sock(sk)
- lock_sock(sk)
- ipv6_set_opt_hdr(sk, ...)
- ipv6_update_options(sk, opt)
- xchg(&inet6_sk(sk)->opt, opt)
^._ opt is never freed.
- sockopt_release_sock(sk)
- release_sock(sk)
Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
acquiring the lock.
This issue exists from the initial commit between IPV6_ADDRFORM and
IPV6_PKTOPTIONS.
Jeremy Kerr [Wed, 12 Oct 2022 02:08:51 +0000 (10:08 +0800)]
mctp: prevent double key removal and unref
Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
close may race, as we attempt to remove a key from lists twice, and
perform an unref for each removal operation. This may result in a uaf
when we attempt the second unref.
This change fixes the race by making __mctp_key_remove tolerant to being
called on a key that has already been removed from the socket/net lists,
and only performs the unref when we do the actual remove. We also need
to hold the list lock on the ioctl cleanup path.
This fix is based on a bug report and comprehensive analysis from
butt3rflyh4ck <butterflyhuangxx@gmail.com>, found via syzkaller.
Cc: stable@vger.kernel.org Fixes: 63ed1aab3d40 ("mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Oct 2022 12:29:07 +0000 (13:29 +0100)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter fixes for net
This series from Phil Sutter for the *net* tree fixes a problem with a change
from the 6.1 development phase: the change to nft_fib should have used
the more recent flowic_l3mdev field. Pointed out by Guillaume Nault.
This also makes the older iptables module follow the same pattern.
Also add selftest case and avoid test failure in nft_fib.sh when the
host environment has set rp_filter=1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Wed, 5 Oct 2022 15:34:36 +0000 (17:34 +0200)]
selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from bbe4c0896d250 ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.
Fixes: bbe4c0896d250 ("selftests: netfilter: disable rp_filter on router") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Wed, 5 Oct 2022 16:07:05 +0000 (18:07 +0200)]
netfilter: rpfilter/fib: Populate flowic_l3mdev field
Use the introduced field for correct operation with VRF devices instead
of conditionally overwriting flowic_oif. This is a partial revert of
commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by
mistake"), implementing a simpler solution.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet [Tue, 11 Oct 2022 22:07:48 +0000 (15:07 -0700)]
tcp: cdg: allow tcp_cdg_release() to be called multiple times
Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]
Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.
[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Oct 2022 08:10:02 +0000 (09:10 +0100)]
Merge branch 'inet-ping-fixes'
Eric Dumazet says:
====================
inet: ping: give ping some care
First patch fixes an ipv6 ping bug that has been there forever,
for large sizes.
Second patch fixes a recent and elusive bug, that can potentially
crash the host. This is what I mentioned privately to Paolo and
Jakub at LPC in Dublin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 11 Oct 2022 21:27:29 +0000 (14:27 -0700)]
inet: ping: fix recent breakage
Blamed commit broke the assumption used by ping sendmsg() that
allocated skb would have MAX_HEADER bytes in skb->head.
This patch changes the way ping works, by making sure
the skb head contains space for the icmp header,
and adjusting ping_getfrag() which was desperate
about going past the icmp header :/
This is adopting what UDP does, mostly.
syzbot is able to crash a host using both kfence and following repro in a loop.
When kfence triggers, skb->head only has 64 bytes, immediately followed
by struct skb_shared_info (no extra headroom based on ksize(ptr))
Then icmpv6_push_pending_frames() is overwriting first bytes
of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
and/or setting shinfo->gso_size to a non zero value.
If nr_frags is mangled, a crash happens in skb_release_data()
If gso_size is mangled, we have the following report:
Fixes: 47cf88993c91 ("net: unify alloclen calculation for paged requests") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 11 Oct 2022 21:27:28 +0000 (14:27 -0700)]
ipv6: ping: fix wrong checksum for large frames
For a given ping datagram, ping_getfrag() is called once
per skb fragment.
A large datagram requiring more than one page fragment
is currently getting the checksum of the last fragment,
instead of the cumulative one.
After this patch, "ping -s 35000 ::1" is working correctly.
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
am65_cpsw_nuss_register_ndevs() skips calling devlink_port_type_eth_set()
for ports without assigned netdev, triggering the following warning when
DEVLINK_PORT_TYPE_WARN_TIMEOUT elapses after 3600s:
Type was not set for devlink port.
WARNING: CPU: 0 PID: 129 at net/core/devlink.c:8095 devlink_port_type_warn+0x18/0x30
Fixes: 0680e20af5fb ("net: ethernet: ti: am65-cpsw: Fix devlink port register sequence") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 12 Oct 2022 02:04:22 +0000 (19:04 -0700)]
Merge tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.1
First set of fixes for v6.1. Quite a lot of fixes in stack but also
for mt76.
cfg80211/mac80211
- fix locking error in mac80211's hw addr change
- fix TX queue stop for internal TXQs
- handling of very small (e.g. STP TCN) packets
- two memcpy() hardening fixes
- fix probe request 6 GHz capability warning
- fix various connection prints
- fix decapsulation offload for AP VLAN
mt76
- fix rate reporting, LLC packets and receive checksum offload on specific chipsets
iwlwifi
- fix crash due to list corruption
ath11k
- fix a compiler warning with GCC 11 and KASAN
* tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases)
wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
wifi: mt76: fix receiving LLC packets on mt7615/mt7915
wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
wifi: wext: use flex array destination for memcpy()
wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
wifi: mac80211: netdev compatible TX stop for iTXQ drivers
wifi: mac80211: fix decap offload for stations on AP_VLAN interfaces
wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
wifi: mac80211: remove/avoid misleading prints
wifi: mac80211: fix probe req HE capabilities access
wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer
====================
Paolo Abeni [Tue, 11 Oct 2022 11:46:55 +0000 (13:46 +0200)]
Merge tag 'linux-can-fixes-for-6.1-20221011' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-10-11
this is a pull request of 4 patches for net/main.
Anssi Hannula and Jimmy Assarsson contribute 4 patches for the
kvaser_usb driver. A check for actual received length of USB transfers
is added, the use of an uninitialized completion is fixed, the TX
queue is re-synced after restart, and the CAN state is fixed after
restart.
* tag 'linux-can-fixes-for-6.1-20221011' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: kvaser_usb_leaf: Fix CAN state after restart
can: kvaser_usb_leaf: Fix TX queue out of sync after restart
can: kvaser_usb: Fix use of uninitialized completion
can: kvaser_usb_leaf: Fix overread with an invalid command
====================
Kalle Valo [Mon, 10 Oct 2022 16:06:38 +0000 (19:06 +0300)]
wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
Linaro reported stringop-overread warnings in ath11k (this is one of many):
drivers/net/wireless/ath/ath11k/mac.c:2238:29: error: 'ath11k_peer_assoc_h_he_limit' reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
My further investigation showed that these warnings happen on GCC 11.3 but not
with GCC 12.2, and with only the kernel config Linaro provided:
I saw the same warnings both with arm64 and x86_64 builds and KASAN seems to be
the reason triggering these warnings with GCC 11. Nobody else has reported
this so this seems to be quite rare corner case. I don't know what specific
commit started emitting this warning so I can't provide a Fixes tag. The
function hasn't been touched for a year.
I decided to workaround this by converting the pointer to a new array in stack,
and then copying the data to the new array. It's only 16 bytes anyway and this
is executed during association, so not in a hotpath.
So, it is necessary to extend the applied solution with commit 14a3aacf517a9
("iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue")
to all other cases where the station queues are invalidated and the related
lists are not emptied. Because, otherwise as before, if some new element is
added later to the list in iwl_mvm_mac_wake_tx_queue, it can match with the
old one and produce the same commented BUG.
That is, in order to avoid this problem completely, we must also remove the
related lists for the other cases when station queues are invalidated.
Fixes: cfbc6c4c5b91c ("iwlwifi: mvm: support mac80211 TXQs model") Reported-by: Petr Stourac <pstourac@redhat.com> Tested-by: Petr Stourac <pstourac@redhat.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221010081611.145027-1-jtornosm@redhat.com
Felix Fietkau [Wed, 5 Oct 2022 13:08:24 +0000 (15:08 +0200)]
wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
Checking the relevant rxd bits for the checksum information only indicates
if the checksum verification was performed by the hardware and doesn't show
actual checksum errors. Checksum errors are indicated in the info field of
the DMA descriptor. Fix packets erroneously marked as CHECKSUM_UNNECESSARY
by checking the extra bits as well.
Those bits are only passed to the driver for MMIO devices at the moment, so
limit checksum offload to those.
Felix Fietkau [Wed, 5 Oct 2022 13:08:23 +0000 (15:08 +0200)]
wifi: mt76: fix receiving LLC packets on mt7615/mt7915
When 802.3 decap offload is enabled, the hardware indicates header translation
failure, whenever either the LLC-SNAP header was not found, or a VLAN header
with an unregcognized tag is present.
In that case, the hardware inserts a 2-byte length fields after the MAC
addresses. For VLAN packets, this tag needs to be removed. However,
for 802.3 LLC packets, the length bytes should be preserved, since there
is no separate ethertype field in the data.
This fixes an issue where the length field was omitted for LLC frames, causing
them to be malformed after hardware decap.
Fixes: 1eeff0b4c1a6 ("mt76: mt7915: fix decap offload corner case with 4-addr VLAN frames") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221005130824.23371-1-nbd@nbd.name
Merge patch series "can: kvaser_usb: Various fixes"
Jimmy Assarsson <extja@kvaser.com> says:
Changes in v5: https://lore.kernel.org/all/20221010150829.199676-1-extja@kvaser.com
- Split series [1], kept only critical bug fixes that should go into
stable, since v4 got rejected [2].
Non-critical fixes are posted in a separate series.
Changes in v4: https://lore.kernel.org/all/20220903182344.139-1-extja@kvaser.com
- Add Tested-by: Anssi Hannula to
[PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
- Update commit message in
[PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
Changes in v3: https://lore.kernel.org/all/20220901122729.271-1-extja@kvaser.com
- Rebase on top of commit 1d5eeda23f36 ("can: kvaser_usb: advertise timestamping capabilities and add ioctl support")
- Add Tested-by: Anssi Hannula
- Add stable@vger.kernel.org to CC.
- Add my S-o-b to all patches
- Fix regression introduced in
[PATCH v2 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
found by Anssi Hannula
https://lore.kernel.org/all/b25bc059-d776-146d-0b3c-41aecf4bd9f8@bitwise.fi
Anssi Hannula [Mon, 10 Oct 2022 15:08:28 +0000 (17:08 +0200)]
can: kvaser_usb_leaf: Fix TX queue out of sync after restart
The TX queue seems to be implicitly flushed by the hardware during
bus-off or bus-off recovery, but the driver does not reset the TX
bookkeeping.
Despite not resetting TX bookkeeping the driver still re-enables TX
queue unconditionally, leading to "cannot find free context" /
NETDEV_TX_BUSY errors if the TX queue was full at bus-off time.
Fix that by resetting TX bookkeeping on CAN restart.
Tested with 0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778.
Cc: stable@vger.kernel.org Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010150829.199676-4-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Anssi Hannula [Mon, 10 Oct 2022 15:08:27 +0000 (17:08 +0200)]
can: kvaser_usb: Fix use of uninitialized completion
flush_comp is initialized when CMD_FLUSH_QUEUE is sent to the device and
completed when the device sends CMD_FLUSH_QUEUE_RESP.
This causes completion of uninitialized completion if the device sends
CMD_FLUSH_QUEUE_RESP before CMD_FLUSH_QUEUE is ever sent (e.g. as a
response to a flush by a previously bound driver, or a misbehaving
device).
Fix that by initializing flush_comp in kvaser_usb_init_one() like the
other completions.
This issue is only triggerable after RX URBs have been set up, i.e. the
interface has been opened at least once.
Cc: stable@vger.kernel.org Fixes: aec5fb2268b7 ("can: kvaser_usb: Add support for Kvaser USB hydra family") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010150829.199676-3-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Anssi Hannula [Mon, 10 Oct 2022 15:08:26 +0000 (17:08 +0200)]
can: kvaser_usb_leaf: Fix overread with an invalid command
For command events read from the device,
kvaser_usb_leaf_read_bulk_callback() verifies that cmd->len does not
exceed the size of the received data, but the actual kvaser_cmd handlers
will happily read any kvaser_cmd fields without checking for cmd->len.
This can cause an overread if the last cmd in the buffer is shorter than
expected for the command type (with cmd->len showing the actual short
size).
Maximum overread seems to be 22 bytes (CMD_LEAF_LOG_MESSAGE), some of
which are delivered to userspace as-is.
Fix that by verifying the length of command before handling it.
This issue can only occur after RX URBs have been set up, i.e. the
interface has been opened at least once.
Cc: stable@vger.kernel.org Fixes: 080f40a6fa28 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010150829.199676-2-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Louis Peens [Fri, 7 Oct 2022 09:21:32 +0000 (11:21 +0200)]
nfp: flower: fix incorrect struct type in GRE key_size
Looks like a copy-paste error sneaked in here at some point,
causing the key_size for these tunnels to be calculated
incorrectly. This size ends up being send to the firmware,
causing unexpected behaviour in some cases.
Fixes: 78a722af4ad9 ("nfp: flower: compile match for IPv6 tunnels") Reported-by: Chaoyong He <chaoyong.he@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20221007092132.218386-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Behún [Fri, 7 Oct 2022 08:48:44 +0000 (10:48 +0200)]
net: sfp: fill also 5gbase-r and 25gbase-r modes in sfp_parse_support()
Fill in also 5gbase-r and 25gbase-r PHY interface modes into the
phy_interface_t bitmap in sfp_parse_support().
Fixes: fd580c983031 ("net: sfp: augment SFP parsing with phy_interface_t bitmap") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221007084844.20352-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: systemport: Enable all RX descriptors for SYSTEMPORT Lite
The original commit that added support for the SYSTEMPORT Lite variant
halved the number of RX descriptors due to a confusion between the
number of descriptors and the number of descriptor words. There are 512
descriptor *words* which means 256 descriptors total.
Johannes Berg [Wed, 5 Oct 2022 21:11:43 +0000 (23:11 +0200)]
wifi: cfg80211: update hidden BSSes to avoid WARN_ON
When updating beacon elements in a non-transmitted BSS,
also update the hidden sub-entries to the same beacon
elements, so that a future update through other paths
won't trigger a WARN_ON().
The warning is triggered because the beacon elements in
the hidden BSSes that are children of the BSS should
always be the same as in the parent.
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 5 Oct 2022 19:24:10 +0000 (21:24 +0200)]
wifi: mac80211: fix crash in beacon protection for P2P-device
If beacon protection is active but the beacon cannot be
decrypted or is otherwise malformed, we call the cfg80211
API to report this to userspace, but that uses a netdev
pointer, which isn't present for P2P-Device. Fix this to
call it only conditionally to ensure cfg80211 won't crash
in the case of P2P-Device.
This fixes CVE-2022-42722.
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 9eaf183af741 ("mac80211: Report beacon protection failures to user space") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 5 Oct 2022 13:10:09 +0000 (15:10 +0200)]
wifi: mac80211_hwsim: avoid mac80211 warning on bad rate
If the tool on the other side (e.g. wmediumd) gets confused
about the rate, we hit a warning in mac80211. Silence that
by effectively duplicating the check here and dropping the
frame silently (in mac80211 it's dropped with the warning).
Johannes Berg [Fri, 30 Sep 2022 22:01:44 +0000 (00:01 +0200)]
wifi: cfg80211: avoid nontransmitted BSS list corruption
If a non-transmitted BSS shares enough information (both
SSID and BSSID!) with another non-transmitted BSS of a
different AP, then we can find and update it, and then
try to add it to the non-transmitted BSS list. We do a
search for it on the transmitted BSS, but if it's not
there (but belongs to another transmitted BSS), the list
gets corrupted.
Since this is an erroneous situation, simply fail the
list insertion in this case and free the non-transmitted
BSS.
This fixes CVE-2022-42721.
Reported-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Sönke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 30 Sep 2022 21:44:23 +0000 (23:44 +0200)]
wifi: cfg80211: fix BSS refcounting bugs
There are multiple refcounting bugs related to multi-BSSID:
- In bss_ref_get(), if the BSS has a hidden_beacon_bss, then
the bss pointer is overwritten before checking for the
transmitted BSS, which is clearly wrong. Fix this by using
the bss_from_pub() macro.
- In cfg80211_bss_update() we copy the transmitted_bss pointer
from tmp into new, but then if we release new, we'll unref
it erroneously. We already set the pointer and ref it, but
need to NULL it since it was copied from the tmp data.
- In cfg80211_inform_single_bss_data(), if adding to the non-
transmitted list fails, we unlink the BSS and yet still we
return it, but this results in returning an entry without
a reference. We shouldn't return it anyway if it was broken
enough to not get added there.
Johannes Berg [Thu, 29 Sep 2022 19:50:44 +0000 (21:50 +0200)]
wifi: cfg80211: ensure length byte is present before access
When iterating the elements here, ensure the length byte is
present before checking it to see if the entire element will
fit into the buffer.
Longer term, we should rewrite this code using the type-safe
element iteration macros that check all of this.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Reported-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 28 Sep 2022 20:07:15 +0000 (22:07 +0200)]
wifi: mac80211: fix MBSSID parsing use-after-free
When we parse a multi-BSSID element, we might point some
element pointers into the allocated nontransmitted_profile.
However, we free this before returning, causing UAF when the
relevant pointers in the parsed elements are accessed.
Fix this by not allocating the scratch buffer separately but
as part of the returned structure instead, that way, there
are no lifetime issues with it.
The scratch buffer introduction as part of the returned data
here is taken from MLO feature work done by Ilan.
This fixes CVE-2022-42719.
Fixes: 5023b14cf4df ("mac80211: support profile split between elements") Co-developed-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 28 Sep 2022 20:01:37 +0000 (22:01 +0200)]
wifi: cfg80211/mac80211: reject bad MBSSID elements
Per spec, the maximum value for the MaxBSSID ('n') indicator is 8,
and the minimum is 1 since a multiple BSSID set with just one BSSID
doesn't make sense (the # of BSSIDs is limited by 2^n).
Limit this in the parsing in both cfg80211 and mac80211, rejecting
any elements with an invalid value.
This fixes potentially bad shifts in the processing of these inside
the cfg80211_gen_new_bssid() function later.
I found this during the investigation of CVE-2022-41674 fixed by the
previous patch.
Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Fixes: 78ac51f81532 ("mac80211: support multi-bssid") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Wed, 28 Sep 2022 19:56:15 +0000 (21:56 +0200)]
wifi: cfg80211: fix u8 overflow in cfg80211_update_notlisted_nontrans()
In the copy code of the elements, we do the following calculation
to reach the end of the MBSSID element:
/* copy the IEs after MBSSID */
cpy_len = mbssid[1] + 2;
This looks fine, however, cpy_len is a u8, the same as mbssid[1],
so the addition of two can overflow. In this case the subsequent
memcpy() will overflow the allocated buffer, since it copies 256
bytes too much due to the way the allocation and memcpy() sizes
are calculated.
Fix this by using size_t for the cpy_len variable.
This fixes CVE-2022-41674.
Reported-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de> Tested-by: Soenke Huster <shuster@seemoo.tu-darmstadt.de> Fixes: 0b8fb8235be8 ("cfg80211: Parsing of Multiple BSSID information in scanning") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Yang Yingliang [Mon, 10 Oct 2022 03:39:45 +0000 (11:39 +0800)]
octeontx2-pf: mcs: fix possible memory leak in otx2_probe()
In error path after calling cn10k_mcs_init(), cn10k_mcs_free() need
be called to avoid memory leak.
Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 71d7e0850476 ("ptp: ocp: Add second GNSS device") Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sun, 9 Oct 2022 01:51:26 +0000 (09:51 +0800)]
octeontx2-af: cn10k: mcs: Fix error return code in mcs_register_interrupts()
If alloc_mem() fails in mcs_register_interrupts(), it should return error
code.
Fixes: 6c635f78c474 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sat, 8 Oct 2022 08:39:42 +0000 (16:39 +0800)]
net: dsa: fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create()
Fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create() to print
error message.
Fixes: cf5ca4ddc37a ("net: dsa: don't leave dangling pointers in dp->pl when failing") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sat, 8 Oct 2022 08:26:50 +0000 (16:26 +0800)]
octeontx2-pf: mcs: fix missing unlock in some error paths
Add the missing unlock in some error paths.
Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 7 Oct 2022 22:57:43 +0000 (15:57 -0700)]
macvlan: enforce a consistent minimal mtu
macvlan should enforce a minimal mtu of 68, even at link creation.
This patch avoids the current behavior (which could lead to crashes
in ipv6 stack if the link is brought up)
$ ip link add macvlan1 link eno1 mtu 8 type macvlan # This should fail !
$ ip link sh dev macvlan1
5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop
state DOWN mode DEFAULT group default qlen 1000
link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff
$ ip link set macvlan1 mtu 67
Error: mtu less than device minimum.
$ ip link set macvlan1 mtu 68
$ ip link set macvlan1 mtu 8
Error: mtu less than device minimum.
Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Sun, 9 Oct 2022 06:37:31 +0000 (14:37 +0800)]
mISDN: hfcpci: Fix use-after-free bug in hfcpci_softirq
The function hfcpci_softirq() is a timer handler. If it
is running, the timer_pending() will return 0 and the
del_timer_sync() in HFC_cleanup() will not be executed.
As a result, the use-after-free bug will happen. The
process is shown below:
The device is deallocated is position [1] and used in
position [2].
Fix by removing the "timer_pending" check in HFC_cleanup(),
which makes sure that the hfcpci_softirq() have finished
before the resource is deallocated.
Fixes: 009fc857c5f6 ("mISDN: fix possible use-after-free in HFC_cleanup()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), split the memcpy() of the header and the payload
so no false positive run-time overflow warning will be generated.
Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Wireless events will be sent on the appropriate channels in
wireless_send_event(). Different wireless events may have different
payload structure and size, so kernel uses **len** and **cmd** field
in struct __compat_iw_event as wireless event common LCP part, uses
**pointer** as a label to mark the position of remaining different part.
Yet the problem is that, **pointer** is a compat_caddr_t type, which may
be smaller than the relative structure at the same position. So during
wireless_send_event() tries to parse the wireless events payload, it may
trigger the memcpy() run-time destination buffer bounds checking when the
relative structure's data is copied to the position marked by **pointer**.
This patch solves it by introducing flexible-array field **ptr_bytes**,
to mark the position of the wireless events remaining part next to
LCP part. What's more, this patch also adds **ptr_len** variable in
wireless_send_event() to improve its maintainability.
Felix Fietkau [Fri, 7 Oct 2022 12:56:11 +0000 (14:56 +0200)]
wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
STP topology change notification packets only have a payload of 7 bytes,
so they get dropped due to the skb->len < hdrlen + 8 check.
Fix this by removing the extra 8 from the skb->len check and checking the
return code on the skb_copy_bits calls.
Fixes: 2d1c304cb2d5 ("cfg80211: add function for 802.3 conversion with separate output buffer") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Alexander Wetzel [Tue, 20 Sep 2022 15:55:41 +0000 (17:55 +0200)]
wifi: mac80211: netdev compatible TX stop for iTXQ drivers
Properly handle TX stop for internal queues (iTXQs) within mac80211.
mac80211 must not stop netdev queues when using mac80211 iTXQs.
For these drivers the netdev interface is created with IFF_NO_QUEUE.
While netdev still drops frames for IFF_NO_QUEUE interfaces when we stop
the netdev queues, it also prints a warning when this happens:
Assuming the mac80211 interface is called wlan0 we would get
"Virtual device wlan0 asks to queue packet!" when netdev has to drop a
frame.
This patch is keeping the harmless netdev queue starts for iTXQ drivers.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Dan Carpenter [Mon, 12 Sep 2022 15:07:16 +0000 (18:07 +0300)]
wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
Unlock before returning -EOPNOTSUPP.
Fixes: 3c06e91b40db ("wifi: mac80211: Support POWERED_ADDR_CHANGE feature") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
James Prestwood [Thu, 15 Sep 2022 19:55:53 +0000 (12:55 -0700)]
wifi: mac80211: remove/avoid misleading prints
At some point a few kernel debug prints started appearing which
indicated something was sending invalid IEs:
"bad VHT capabilities, disabling VHT"
"Invalid HE elem, Disable HE"
Turns out these were being printed because the local hardware
supported HE/VHT but the peer/AP did not. Bad/invalid indicates,
to me at least, that the IE is in some way malformed, not missing.
For the HE print (ieee80211_verify_peer_he_mcs_support) it will
now silently fail if the HE capability element is missing (still
prints if the element size is wrong).
For the VHT print, it has been removed completely and will silently
set the DISABLE_VHT flag which is consistent with how DISABLE_HT
is set.
Signed-off-by: James Prestwood <prestwoj@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
James Prestwood [Wed, 28 Sep 2022 22:49:10 +0000 (15:49 -0700)]
wifi: mac80211: fix probe req HE capabilities access
When building the probe request IEs HE support is checked for
the 6GHz band (wiphy->bands[NL80211_BAND_6GHZ]). If supported
the HE capability IE should be included according to the spec.
The problem is the 16-bit capability is obtained from the
band object (sband) that was passed in, not the 6GHz band
object (sband6). If the sband object doesn't support HE it will
result in a warning.
Fixes: 7d29bc50b30e ("mac80211: always include HE 6GHz capability in probe request") Signed-off-by: James Prestwood <prestwoj@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Fri, 7 Oct 2022 09:05:09 +0000 (11:05 +0200)]
wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
Since STP TCN frames are only 7 bytes, the pskb_may_pull call returns an error.
Instead of dropping those packets, bump them back to the slow path for proper
processing.
Fixes: 49ddf8e6e234 ("mac80211: add fast-rx path") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
GCC-12 emits false positive -Warray-bounds warnings with
CONFIG_UBSAN_SHIFT (-fsanitize=shift). This is fixed in GCC 13[1],
and there is top-level Makefile logic to remove -Warray-bounds for
known-bad GCC versions staring with commit f0be87c42cbd ("gcc-12: disable
'-Warray-bounds' universally for now").
GCC-12 emits false positive -Warray-bounds warnings with
CONFIG_UBSAN_SHIFT (-fsanitize=shift). This is fixed in GCC 13[1],
and there is top-level Makefile logic to remove -Warray-bounds for
known-bad GCC versions staring with commit f0be87c42cbd ("gcc-12: disable
'-Warray-bounds' universally for now").
Serhiy Boiko [Thu, 6 Oct 2022 19:04:09 +0000 (22:04 +0300)]
prestera: matchall: do not rollback if rule exists
If you try to create a 'mirror' ACL rule on a port that already has a
mirror rule, prestera_span_rule_add() will fail with EEXIST error.
This forces rollback procedure which destroys existing mirror rule on
hardware leaving it visible in linux.
Add an explicit check for EEXIST to prevent the deletion of the existing
rule but keep user seeing error message:
$ tc filter add dev sw1p1 ... skip_sw action mirred egress mirror dev sw1p2
$ tc filter add dev sw1p1 ... skip_sw action mirred egress mirror dev sw1p3
RTNETLINK answers: File exists
We have an error talking to the kernel
Fixes: 13defa275eef ("net: marvell: prestera: Add matchall support") Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 6 Oct 2022 16:48:49 +0000 (10:48 -0600)]
ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference
Gwangun Jung reported a slab-out-of-bounds access in fib_nh_match:
fib_nh_match+0xf98/0x1130 linux-6.0-rc7/net/ipv4/fib_semantics.c:961
fib_table_delete+0x5f3/0xa40 linux-6.0-rc7/net/ipv4/fib_trie.c:1753
inet_rtm_delroute+0x2b3/0x380 linux-6.0-rc7/net/ipv4/fib_frontend.c:874
Separate nexthop objects are mutually exclusive with the legacy
multipath spec. Fix fib_nh_match to return if the config for the
to be deleted route contains a multipath spec while the fib_info
is using a nexthop object.
Fixes: 493ced1ac47c ("ipv4: Allow routes to use nexthop objects") Fixes: 6bf92d70e690 ("net: ipv4: fix route with nexthop object delete warning") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Thu, 6 Oct 2022 12:01:36 +0000 (20:01 +0800)]
net: enetc: Remove duplicated include in enetc_qos.c
net/pkt_sched.h is included twice in enetc_qos.c,
remove one of them.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2334 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Thu, 6 Oct 2022 11:44:00 +0000 (19:44 +0800)]
octeontx2-pf: mcs: remove unneeded semicolon
Semicolon is not required after curly braces.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2332 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gaurav Kohli [Thu, 6 Oct 2022 05:52:59 +0000 (22:52 -0700)]
hv_netvsc: Fix race between VF offering and VF association message from host
During vm boot, there might be possibility that vf registration
call comes before the vf association from host to vm.
And this might break netvsc vf path, To prevent the same block
vf registration until vf bind message comes from host.
Cc: stable@vger.kernel.org Fixes: 00d7ddba11436 ("hv_netvsc: pair VF based on serial number") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 6 Oct 2022 02:02:37 +0000 (22:02 -0400)]
net: ieee802154: return -EINVAL for unknown addr type
This patch adds handling to return -EINVAL for an unknown addr type. The
current behaviour is to return 0 as successful but the size of an
unknown addr type is not defined and should return an error like -EINVAL.
Fixes: 94160108a70c ("net/ieee802154: fix uninit value bug in dgram_sendmsg") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wenjia Zhang [Fri, 7 Oct 2022 06:54:36 +0000 (08:54 +0200)]
MAINTAINERS: add Jan as SMC maintainer
Add Jan as maintainer for Shared Memory Communications (SMC)
Sockets.
Acked-by: Jan Karcher <jaka@linux.ibm.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: pse-pd: PSE_REGULATOR should depend on REGULATOR
The Regulator based PSE controller driver relies on regulator support to
be enabled. If regulator support is disabled, it will still compile
fine, but won't operate correctly.
Hence add a dependency on REGULATOR, to prevent asking the user about
this driver when configuring a kernel without regulator support.
Vladimir Oltean [Tue, 4 Oct 2022 22:01:00 +0000 (01:01 +0300)]
Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"
taprio_attach() has this logic at the end, which should have been
removed with the blamed patch (which is now being reverted):
/* access to the child qdiscs is not needed in offload mode */
if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
kfree(q->qdiscs);
q->qdiscs = NULL;
}
because otherwise, we make use of q->qdiscs[] even after this array was
deallocated, namely in taprio_leaf(). Therefore, whenever one would try
to attach a valid child qdisc to a fully offloaded taprio root, one
would immediately dereference a NULL pointer.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
pc : taprio_leaf+0x28/0x40
lr : qdisc_leaf+0x3c/0x60
Call trace:
taprio_leaf+0x28/0x40
tc_modify_qdisc+0xf0/0x72c
rtnetlink_rcv_msg+0x12c/0x390
netlink_rcv_skb+0x5c/0x130
rtnetlink_rcv+0x1c/0x2c
The solution is not as obvious as the problem. The code which deallocates
q->qdiscs[] is in fact copied and pasted from mqprio, which also
deallocates the array in mqprio_attach() and never uses it afterwards.
Therefore, the identical cleanup logic of priv->qdiscs[] that
mqprio_destroy() has is deceptive because it will never take place at
qdisc_destroy() time, but just at raw ops->destroy() time (otherwise
said, priv->qdiscs[] do not last for the entire lifetime of the mqprio
root), but rather, this is just the twisted way in which the Qdisc API
understands error path cleanup should be done (Qdisc_ops :: destroy() is
called even when Qdisc_ops :: init() never succeeded).
Side note, in fact this is also what the comment in mqprio_init() says:
/* pre-allocate qdisc, attachment can't fail */
Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as
data passing between Qdisc_ops :: init() and Qdisc_ops :: attach().
[ this comment was also copied and pasted into the initial taprio
commit, even though taprio_attach() came way later ]
The problem is that taprio also makes extensive use of the q->qdiscs[]
array in the software fast path (taprio_enqueue() and taprio_dequeue()),
but it does not keep a reference of its own on q->qdiscs[i] (you'd think
that since it creates these Qdiscs, it holds the reference, but nope,
this is not completely true).
To understand the difference between taprio_destroy() and mqprio_destroy()
one must look before commit 13511704f8d7 ("net: taprio offload: enforce
qdisc to netdev queue mapping"), because that just muddied the waters.
In the "original" taprio design, taprio always attached itself (the root
Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go
through taprio_enqueue().
It also called qdisc_refcount_inc() on itself for as many times as there
were netdev TX queues, in order to counter-balance what tc_get_qdisc()
does when destroying a Qdisc (simplified for brevity below):
if (n->nlmsg_type == RTM_DELQDISC)
err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack);
qdisc_graft(where "new" is NULL so this deletes the Qdisc):
for (i = 0; i < num_q; i++) {
struct netdev_queue *dev_queue;
dev_queue = netdev_get_tx_queue(dev, i);
old = dev_graft_qdisc(dev_queue, new);
if (new && i > 0)
qdisc_refcount_inc(new);
qdisc_put(old);
~~~~~~~~~~~~~~
this decrements taprio's refcount once for each TX queue
}
notify_and_destroy(net, skb, n, classid,
rtnl_dereference(dev->qdisc), new);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and this finally decrements it to zero,
making qdisc_put() call qdisc_destroy()
The q->qdiscs[] created using qdisc_create_dflt() (or their
replacements, if taprio_graft() was ever to get called) were then
privately freed by taprio_destroy().
This is still what is happening after commit 13511704f8d7 ("net: taprio
offload: enforce qdisc to netdev queue mapping"), but only for software
mode.
In full offload mode, the per-txq "qdisc_put(old)" calls from
qdisc_graft() now deallocate the child Qdiscs rather than decrement
taprio's refcount. So when notify_and_destroy(taprio) finally calls
taprio_destroy(), the difference is that the child Qdiscs were already
deallocated.
And this is exactly why the taprio_attach() comment "access to the child
qdiscs is not needed in offload mode" is deceptive too. Not only the
q->qdiscs[] array is not needed, but it is also necessary to get rid of
it as soon as possible, because otherwise, we will also call qdisc_put()
on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this
will cause a nasty use-after-free/refcount-saturate/whatever.
In short, the problem is that since the blamed commit, taprio_leaf()
needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy()
-> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach()
for full offload. Fixing one problem triggers the other.
All of this can be solved by making taprio keep its q->qdiscs[i] with a
refcount elevated at 2 (in offloaded mode where they are attached to the
netdev TX queues), both in taprio_attach() and in taprio_graft(). The
generic qdisc_graft() would just decrement the child qdiscs' refcounts
to 1, and taprio_destroy() would give them the final coup de grace.
However the rabbit hole of changes is getting quite deep, and the
complexity increases. The blamed commit was supposed to be a bug fix in
the first place, and the bug it addressed is not so significant so as to
justify further rework in stable trees. So I'd rather just revert it.
I don't know enough about multi-queue Qdisc design to make a proper
judgement right now regarding what is/isn't idiomatic use of Qdisc
concepts in taprio. I will try to study the problem more and come with a
different solution in net-next.
Fixes: 1461d212ab27 ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs") Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot is hitting skb_assert_len() warning at __dev_queue_xmit() [1],
for PF_IEEE802154 socket's zero-sized raw_sendmsg() request is hitting
__dev_queue_xmit() with skb->len == 0.
Since PF_IEEE802154 socket's zero-sized raw_sendmsg() request was
able to return 0, don't call __dev_queue_xmit() if packet length is 0.
Note that this might be a sign that commit fd1894224407c484 ("bpf: Don't
redirect packets with invalid pkt_len") should be reverted, for
skb->len == 0 was acceptable for at least PF_IEEE802154 socket.
Linus Torvalds [Tue, 4 Oct 2022 20:38:03 +0000 (13:38 -0700)]
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO. This
significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions. Expose
crash_kexec() to allow BPF to trigger a kernel dump. Use
CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs. Only
structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols:
- WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
(MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API:
- Allow selecting the conduit interface used by each port in DSA
switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one of
the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much as
possible. It's one of those magic knobs which seemed like a good
idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers:
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers:
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay window
offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
Linus Torvalds [Tue, 4 Oct 2022 18:13:38 +0000 (11:13 -0700)]
Merge tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"Improve user help for Landlock (documentation and sample)"
* tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix documentation style
landlock: Slightly improve documentation and fix spelling
samples/landlock: Print hints about ABI versions
Linus Torvalds [Tue, 4 Oct 2022 18:05:43 +0000 (11:05 -0700)]
Merge tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Six audit patches for v6.1, most are pretty trivial, but a quick list
of the highlights are below:
- Only free the audit proctitle information on task exit. This allows
us to cache the information and improve performance slightly.
- Use the time_after() macro to do time comparisons instead of doing
it directly and potentially causing ourselves problems when the
timer wraps.
- Convert an audit_context state comparison from a relative enum
comparison, e.g. (x < y), to a not-equal comparison to ensure that
we are not caught out at some unknown point in the future by an
enum shuffle.
- A handful of small cleanups such as tidying up comments and
removing unused declarations"
* tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove selinux_audit_rule_update() declaration
audit: use time_after to compare time
audit: free audit_proctitle only on task exit
audit: explicitly check audit_context->context enum value
audit: audit_context pid unused, context enum comment fix
audit: fix repeated words in comments
Linus Torvalds [Tue, 4 Oct 2022 17:24:11 +0000 (10:24 -0700)]
Merge tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- The usual round of smaller fixes and cleanups all over the tree
* tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32
x86: Fix various duplicate-word comment typos
x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h
Linus Torvalds [Tue, 4 Oct 2022 17:14:58 +0000 (10:14 -0700)]
Merge tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:
- More work by James Morse to disentangle the resctrl filesystem
generic code from the architectural one with the endgoal of plugging
ARM's MPAM implementation into it too so that the user interface
remains the same
- Properly restore the MSR_MISC_FEATURE_CONTROL value instead of
blindly overwriting it to 0
* tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
x86/resctrl: Rename and change the units of resctrl_cqm_threshold
x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
x86/resctrl: Abstract __rmid_read()
x86/resctrl: Allow per-rmid arch private storage to be reset
x86/resctrl: Add per-rmid arch private storage for overflow and chunks
x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
x86/resctrl: Allow update_mba_bw() to update controls directly
x86/resctrl: Remove architecture copy of mbps_val
x86/resctrl: Switch over to the resctrl mbps_val list
x86/resctrl: Create mba_sc configuration in the rdt_domain
x86/resctrl: Abstract and use supports_mba_mbps()
x86/resctrl: Remove set_mba_sc()s control array re-initialisation
x86/resctrl: Add domain offline callback for resctrl work
x86/resctrl: Group struct rdt_hw_domain cleanup
x86/resctrl: Add domain online callback for resctrl work
x86/resctrl: Merge mon_capable and mon_enabled
...
Linus Torvalds [Tue, 4 Oct 2022 17:12:08 +0000 (10:12 -0700)]
Merge tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x75 microcode loader updates from Borislav Petkov:
- Get rid of a single ksize() usage
- By popular demand, print the previous microcode revision an update
was done over
- Remove more code related to the now gone MICROCODE_OLD_INTERFACE
- Document the problems stemming from microcode late loading
* tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Track patch allocation size explicitly
x86/microcode: Print previous version of microcode after reload
x86/microcode: Remove ->request_microcode_user()
x86/microcode: Document the whole late loading problem
Linus Torvalds [Tue, 4 Oct 2022 17:00:27 +0000 (10:00 -0700)]
Merge tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Borislav Petkov:
- Drop misleading "RIP" from the opcodes dumping message
- Correct APM entry's Konfig help text
* tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Don't mention RIP in "Code: "
x86/Kconfig: Specify idle=poll instead of no-hlt
Linus Torvalds [Tue, 4 Oct 2022 16:49:50 +0000 (09:49 -0700)]
Merge tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm update from Borislav Petkov:
- Use the __builtin_ffs/ctzl() compiler builtins for the constant
argument case in the kernel's optimized ffs()/ffz() helpers in order
to make use of the compiler's constant folding optmization passes.
* tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions
x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
Linus Torvalds [Tue, 4 Oct 2022 16:46:22 +0000 (09:46 -0700)]
Merge tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core fixes from Borislav Petkov:
- Make sure an INT3 is slapped after every unconditional retpoline JMP
as both vendors suggest
- Clean up pciserial a bit
* tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86,retpoline: Be sure to emit INT3 after JMP *%\reg
x86/earlyprintk: Clean up pciserial
Linus Torvalds [Tue, 4 Oct 2022 16:33:12 +0000 (09:33 -0700)]
Merge tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Fix the APEI MCE callback handler to consult the hardware about the
granularity of the memory error instead of hard-coding it
- Offline memory pages on Intel machines after 2 errors reported per
page
* tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Retrieve poison range from hardware
RAS/CEC: Reduce offline page threshold for Intel systems
Linus Torvalds [Tue, 4 Oct 2022 16:21:30 +0000 (09:21 -0700)]
Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- Print the CPU number at segfault time.
The number printed is not always accurate (preemption is enabled at
that time) but the print string contains "likely" and after a lot of
back'n'forth on this, this was the consensus that was reached. See
thread at [1].
- After a *lot* of testing and polishing, finally the clear_user()
improvements to inline REP; STOSB by default
Linus Torvalds [Tue, 4 Oct 2022 16:13:21 +0000 (09:13 -0700)]
Merge tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RTC cleanups from Borislav Petkov:
- Cleanup x86/rtc.c and delete duplicated functionality in favor of
using the respective functionality from the RTC library
* tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/rtc: Rename mach_set_rtc_mmss() to mach_set_cmos_time()
x86/rtc: Rewrite & simplify mach_get_cmos_time() by deleting duplicated functionality
Linus Torvalds [Tue, 4 Oct 2022 15:58:02 +0000 (08:58 -0700)]
Merge tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Add support for Skylake-S CPUs to ie31200_edac
- Improve error decoding speed of the Intel drivers by avoiding the
ACPI facilities but doing decoding in the driver itself
- Other misc improvements to the Intel drivers
- The usual cleanups and fixlets all over EDAC land
* tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/i7300: Correct the i7300_exit() function name in comment
x86/sb_edac: Add row column translation for Broadwell
EDAC/i10nm: Print an extra register set of retry_rd_err_log
EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
EDAC/skx_common: Add ChipSelect ADXL component
EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
EDAC: Remove obsolete declarations in edac_module.h
EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
EDAC/skx_common: Make output format similar
EDAC/skx_common: Use driver decoder first
EDAC/mc: Drop duplicated dimm->nr_pages debug printout
EDAC/mc: Replace spaces with tabs in memtype flags definition
EDAC/wq: Remove unneeded flush_workqueue()
EDAC/ie31200: Add Skylake-S support
Borislav Petkov [Tue, 4 Oct 2022 08:00:25 +0000 (10:00 +0200)]
Merge branches 'edac-drivers' and 'edac-misc' into edac-updates-for-v6.1
Combine all queued EDAC changes for submission into v6.1:
* edac-drivers:
EDAC/ie31200: Add Skylake-S support
* edac-misc:
EDAC/i7300: Correct the i7300_exit() function name in comment
x86/sb_edac: Add row column translation for Broadwell
EDAC/i10nm: Print an extra register set of retry_rd_err_log
EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
EDAC/skx_common: Add ChipSelect ADXL component
EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
EDAC: Remove obsolete declarations in edac_module.h
EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
EDAC/skx_common: Make output format similar
EDAC/skx_common: Use driver decoder first
EDAC/mc: Drop duplicated dimm->nr_pages debug printout
EDAC/mc: Replace spaces with tabs in memtype flags definition
EDAC/wq: Remove unneeded flush_workqueue()
Linus Torvalds [Tue, 4 Oct 2022 03:33:41 +0000 (20:33 -0700)]
Merge tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull STATX_DIOALIGN support from Eric Biggers:
"Make statx() support reporting direct I/O (DIO) alignment information.
This provides a generic interface for userspace programs to determine
whether a file supports DIO, and if so with what alignment
restrictions. Specifically, STATX_DIOALIGN works on block devices, and
on regular files when their containing filesystem has implemented
support.
An interface like this has been requested for years, since the
conditions for when DIO is supported in Linux have gotten increasingly
complex over time. Today, DIO support and alignment requirements can
be affected by various filesystem features such as multi-device
support, data journalling, inline data, encryption, verity,
compression, checkpoint disabling, log-structured mode, etc.
Further complicating things, Linux v6.0 relaxed the traditional rule
of DIO needing to be aligned to the block device's logical block size;
now user buffers (but not file offsets) only need to be aligned to the
DMA alignment.
The approach of uplifting the XFS specific ioctl XFS_IOC_DIOINFO was
discarded in favor of creating a clean new interface with statx().
For more information, see the individual commits and the man page
update[1]"
Link: https://lore.kernel.org/r/20220722074229.148925-1-ebiggers@kernel.org
* tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
xfs: support STATX_DIOALIGN
f2fs: support STATX_DIOALIGN
f2fs: simplify f2fs_force_buffered_io()
f2fs: move f2fs_force_buffered_io() into file.c
ext4: support STATX_DIOALIGN
fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN
vfs: support STATX_DIOALIGN on block devices
statx: add direct I/O alignment information