syzbot reported a memory leak [0] related to IPV6_ADDRFORM.
The scenario is that while one thread is converting an IPv6 socket into
IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
allocates memory to inet6_sk(sk)->XXX after conversion.
Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
which inet6_destroy_sock() should have cleaned up.
setsockopt(IPV6_ADDRFORM) setsockopt(IPV6_DSTOPTS)
+-----------------------+ +----------------------+
- do_ipv6_setsockopt(sk, ...)
- sockopt_lock_sock(sk) - do_ipv6_setsockopt(sk, ...)
- lock_sock(sk) ^._ called via tcpv6_prot
- WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE()
- xchg(&np->opt, NULL)
- txopt_put(opt)
- sockopt_release_sock(sk)
- release_sock(sk) - sockopt_lock_sock(sk)
- lock_sock(sk)
- ipv6_set_opt_hdr(sk, ...)
- ipv6_update_options(sk, opt)
- xchg(&inet6_sk(sk)->opt, opt)
^._ opt is never freed.
- sockopt_release_sock(sk)
- release_sock(sk)
Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
acquiring the lock.
This issue exists from the initial commit between IPV6_ADDRFORM and
IPV6_PKTOPTIONS.
Jeremy Kerr [Wed, 12 Oct 2022 02:08:51 +0000 (10:08 +0800)]
mctp: prevent double key removal and unref
Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
close may race, as we attempt to remove a key from lists twice, and
perform an unref for each removal operation. This may result in a uaf
when we attempt the second unref.
This change fixes the race by making __mctp_key_remove tolerant to being
called on a key that has already been removed from the socket/net lists,
and only performs the unref when we do the actual remove. We also need
to hold the list lock on the ioctl cleanup path.
This fix is based on a bug report and comprehensive analysis from
butt3rflyh4ck <butterflyhuangxx@gmail.com>, found via syzkaller.
Cc: stable@vger.kernel.org Fixes: bf2078784326 ("mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control") Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Oct 2022 12:29:07 +0000 (13:29 +0100)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter fixes for net
This series from Phil Sutter for the *net* tree fixes a problem with a change
from the 6.1 development phase: the change to nft_fib should have used
the more recent flowic_l3mdev field. Pointed out by Guillaume Nault.
This also makes the older iptables module follow the same pattern.
Also add selftest case and avoid test failure in nft_fib.sh when the
host environment has set rp_filter=1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Wed, 5 Oct 2022 15:34:36 +0000 (17:34 +0200)]
selftests: netfilter: Fix nft_fib.sh for all.rp_filter=1
If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from aa1e4da7c6fc5 ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.
Fixes: aa1e4da7c6fc5 ("selftests: netfilter: disable rp_filter on router") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
Phil Sutter [Wed, 5 Oct 2022 16:07:05 +0000 (18:07 +0200)]
netfilter: rpfilter/fib: Populate flowic_l3mdev field
Use the introduced field for correct operation with VRF devices instead
of conditionally overwriting flowic_oif. This is a partial revert of
commit 07980f538cd5d ("netfilter: Fix rpfilter dropping vrf packets by
mistake"), implementing a simpler solution.
Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
Eric Dumazet [Tue, 11 Oct 2022 22:07:48 +0000 (15:07 -0700)]
tcp: cdg: allow tcp_cdg_release() to be called multiple times
Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]
Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.
[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
Fixes: c02084a8e83c ("tcp: add CDG congestion control") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 12 Oct 2022 08:10:02 +0000 (09:10 +0100)]
Merge branch 'inet-ping-fixes'
Eric Dumazet says:
====================
inet: ping: give ping some care
First patch fixes an ipv6 ping bug that has been there forever,
for large sizes.
Second patch fixes a recent and elusive bug, that can potentially
crash the host. This is what I mentioned privately to Paolo and
Jakub at LPC in Dublin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 11 Oct 2022 21:27:29 +0000 (14:27 -0700)]
inet: ping: fix recent breakage
Blamed commit broke the assumption used by ping sendmsg() that
allocated skb would have MAX_HEADER bytes in skb->head.
This patch changes the way ping works, by making sure
the skb head contains space for the icmp header,
and adjusting ping_getfrag() which was desperate
about going past the icmp header :/
This is adopting what UDP does, mostly.
syzbot is able to crash a host using both kfence and following repro in a loop.
When kfence triggers, skb->head only has 64 bytes, immediately followed
by struct skb_shared_info (no extra headroom based on ksize(ptr))
Then icmpv6_push_pending_frames() is overwriting first bytes
of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
and/or setting shinfo->gso_size to a non zero value.
If nr_frags is mangled, a crash happens in skb_release_data()
If gso_size is mangled, we have the following report:
Fixes: bc4494a69436 ("net: unify alloclen calculation for paged requests") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 11 Oct 2022 21:27:28 +0000 (14:27 -0700)]
ipv6: ping: fix wrong checksum for large frames
For a given ping datagram, ping_getfrag() is called once
per skb fragment.
A large datagram requiring more than one page fragment
is currently getting the checksum of the last fragment,
instead of the cumulative one.
After this patch, "ping -s 35000 ::1" is working correctly.
Fixes: 6b4e34c00d7c ("net: ipv6: Add IPv6 support to the ping socket.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: am65-cpsw: set correct devlink flavour for unused ports
am65_cpsw_nuss_register_ndevs() skips calling devlink_port_type_eth_set()
for ports without assigned netdev, triggering the following warning when
DEVLINK_PORT_TYPE_WARN_TIMEOUT elapses after 3600s:
Type was not set for devlink port.
WARNING: CPU: 0 PID: 129 at net/core/devlink.c:8095 devlink_port_type_warn+0x18/0x30
Fixes: 593131586739 ("net: ethernet: ti: am65-cpsw: Fix devlink port register sequence") Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 12 Oct 2022 02:04:22 +0000 (19:04 -0700)]
Merge tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Kalle Valo says:
====================
wireless fixes for v6.1
First set of fixes for v6.1. Quite a lot of fixes in stack but also
for mt76.
cfg80211/mac80211
- fix locking error in mac80211's hw addr change
- fix TX queue stop for internal TXQs
- handling of very small (e.g. STP TCN) packets
- two memcpy() hardening fixes
- fix probe request 6 GHz capability warning
- fix various connection prints
- fix decapsulation offload for AP VLAN
mt76
- fix rate reporting, LLC packets and receive checksum offload on specific chipsets
iwlwifi
- fix crash due to list corruption
ath11k
- fix a compiler warning with GCC 11 and KASAN
* tag 'wireless-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
wifi: iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue (other cases)
wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
wifi: mt76: fix receiving LLC packets on mt7615/mt7915
wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
wifi: wext: use flex array destination for memcpy()
wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
wifi: mac80211: netdev compatible TX stop for iTXQ drivers
wifi: mac80211: fix decap offload for stations on AP_VLAN interfaces
wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
wifi: mac80211: remove/avoid misleading prints
wifi: mac80211: fix probe req HE capabilities access
wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
wifi: mt76: fix rate reporting / throughput regression on mt7915 and newer
====================
Paolo Abeni [Tue, 11 Oct 2022 11:46:55 +0000 (13:46 +0200)]
Merge tag 'linux-can-fixes-for-6.1-20221011' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-10-11
this is a pull request of 4 patches for net/main.
Anssi Hannula and Jimmy Assarsson contribute 4 patches for the
kvaser_usb driver. A check for actual received length of USB transfers
is added, the use of an uninitialized completion is fixed, the TX
queue is re-synced after restart, and the CAN state is fixed after
restart.
* tag 'linux-can-fixes-for-6.1-20221011' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: kvaser_usb_leaf: Fix CAN state after restart
can: kvaser_usb_leaf: Fix TX queue out of sync after restart
can: kvaser_usb: Fix use of uninitialized completion
can: kvaser_usb_leaf: Fix overread with an invalid command
====================
Kalle Valo [Mon, 10 Oct 2022 16:06:38 +0000 (19:06 +0300)]
wifi: ath11k: mac: fix reading 16 bytes from a region of size 0 warning
Linaro reported stringop-overread warnings in ath11k (this is one of many):
drivers/net/wireless/ath/ath11k/mac.c:2238:29: error: 'ath11k_peer_assoc_h_he_limit' reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
My further investigation showed that these warnings happen on GCC 11.3 but not
with GCC 12.2, and with only the kernel config Linaro provided:
I saw the same warnings both with arm64 and x86_64 builds and KASAN seems to be
the reason triggering these warnings with GCC 11. Nobody else has reported
this so this seems to be quite rare corner case. I don't know what specific
commit started emitting this warning so I can't provide a Fixes tag. The
function hasn't been touched for a year.
I decided to workaround this by converting the pointer to a new array in stack,
and then copying the data to the new array. It's only 16 bytes anyway and this
is executed during association, so not in a hotpath.
So, it is necessary to extend the applied solution with commit 4d8c9121608c7
("iwlwifi: mvm: fix double list_add at iwl_mvm_mac_wake_tx_queue")
to all other cases where the station queues are invalidated and the related
lists are not emptied. Because, otherwise as before, if some new element is
added later to the list in iwl_mvm_mac_wake_tx_queue, it can match with the
old one and produce the same commented BUG.
That is, in order to avoid this problem completely, we must also remove the
related lists for the other cases when station queues are invalidated.
Fixes: 10f7adb0f1fef ("iwlwifi: mvm: support mac80211 TXQs model") Reported-by: Petr Stourac <pstourac@redhat.com> Tested-by: Petr Stourac <pstourac@redhat.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221010081611.145027-1-jtornosm@redhat.com
Felix Fietkau [Wed, 5 Oct 2022 13:08:24 +0000 (15:08 +0200)]
wifi: mt76: fix rx checksum offload on mt7615/mt7915/mt7921
Checking the relevant rxd bits for the checksum information only indicates
if the checksum verification was performed by the hardware and doesn't show
actual checksum errors. Checksum errors are indicated in the info field of
the DMA descriptor. Fix packets erroneously marked as CHECKSUM_UNNECESSARY
by checking the extra bits as well.
Those bits are only passed to the driver for MMIO devices at the moment, so
limit checksum offload to those.
Felix Fietkau [Wed, 5 Oct 2022 13:08:23 +0000 (15:08 +0200)]
wifi: mt76: fix receiving LLC packets on mt7615/mt7915
When 802.3 decap offload is enabled, the hardware indicates header translation
failure, whenever either the LLC-SNAP header was not found, or a VLAN header
with an unregcognized tag is present.
In that case, the hardware inserts a 2-byte length fields after the MAC
addresses. For VLAN packets, this tag needs to be removed. However,
for 802.3 LLC packets, the length bytes should be preserved, since there
is no separate ethertype field in the data.
This fixes an issue where the length field was omitted for LLC frames, causing
them to be malformed after hardware decap.
Fixes: e2d06421b3cb ("mt76: mt7915: fix decap offload corner case with 4-addr VLAN frames") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20221005130824.23371-1-nbd@nbd.name
Merge patch series "can: kvaser_usb: Various fixes"
Jimmy Assarsson <extja@kvaser.com> says:
Changes in v5: https://lore.kernel.org/all/20221010150829.199676-1-extja@kvaser.com
- Split series [1], kept only critical bug fixes that should go into
stable, since v4 got rejected [2].
Non-critical fixes are posted in a separate series.
Changes in v4: https://lore.kernel.org/all/20220903182344.139-1-extja@kvaser.com
- Add Tested-by: Anssi Hannula to
[PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
- Update commit message in
[PATCH v4 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
Changes in v3: https://lore.kernel.org/all/20220901122729.271-1-extja@kvaser.com
- Rebase on top of commit a2685de5741e ("can: kvaser_usb: advertise timestamping capabilities and add ioctl support")
- Add Tested-by: Anssi Hannula
- Add stable@vger.kernel.org to CC.
- Add my S-o-b to all patches
- Fix regression introduced in
[PATCH v2 04/15] can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device
found by Anssi Hannula
https://lore.kernel.org/all/b25bc059-d776-146d-0b3c-41aecf4bd9f8@bitwise.fi
Anssi Hannula [Mon, 10 Oct 2022 15:08:28 +0000 (17:08 +0200)]
can: kvaser_usb_leaf: Fix TX queue out of sync after restart
The TX queue seems to be implicitly flushed by the hardware during
bus-off or bus-off recovery, but the driver does not reset the TX
bookkeeping.
Despite not resetting TX bookkeeping the driver still re-enables TX
queue unconditionally, leading to "cannot find free context" /
NETDEV_TX_BUSY errors if the TX queue was full at bus-off time.
Fix that by resetting TX bookkeeping on CAN restart.
Tested with 0bfd:0124 Kvaser Mini PCI Express 2xHS FW 4.18.778.
Cc: stable@vger.kernel.org Fixes: d001c8c6a4d1 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010150829.199676-4-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Anssi Hannula [Mon, 10 Oct 2022 15:08:27 +0000 (17:08 +0200)]
can: kvaser_usb: Fix use of uninitialized completion
flush_comp is initialized when CMD_FLUSH_QUEUE is sent to the device and
completed when the device sends CMD_FLUSH_QUEUE_RESP.
This causes completion of uninitialized completion if the device sends
CMD_FLUSH_QUEUE_RESP before CMD_FLUSH_QUEUE is ever sent (e.g. as a
response to a flush by a previously bound driver, or a misbehaving
device).
Fix that by initializing flush_comp in kvaser_usb_init_one() like the
other completions.
This issue is only triggerable after RX URBs have been set up, i.e. the
interface has been opened at least once.
Cc: stable@vger.kernel.org Fixes: a55fe6fb9371 ("can: kvaser_usb: Add support for Kvaser USB hydra family") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010150829.199676-3-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Anssi Hannula [Mon, 10 Oct 2022 15:08:26 +0000 (17:08 +0200)]
can: kvaser_usb_leaf: Fix overread with an invalid command
For command events read from the device,
kvaser_usb_leaf_read_bulk_callback() verifies that cmd->len does not
exceed the size of the received data, but the actual kvaser_cmd handlers
will happily read any kvaser_cmd fields without checking for cmd->len.
This can cause an overread if the last cmd in the buffer is shorter than
expected for the command type (with cmd->len showing the actual short
size).
Maximum overread seems to be 22 bytes (CMD_LEAF_LOG_MESSAGE), some of
which are delivered to userspace as-is.
Fix that by verifying the length of command before handling it.
This issue can only occur after RX URBs have been set up, i.e. the
interface has been opened at least once.
Cc: stable@vger.kernel.org Fixes: d001c8c6a4d1 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Tested-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://lore.kernel.org/all/20221010150829.199676-2-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Louis Peens [Fri, 7 Oct 2022 09:21:32 +0000 (11:21 +0200)]
nfp: flower: fix incorrect struct type in GRE key_size
Looks like a copy-paste error sneaked in here at some point,
causing the key_size for these tunnels to be calculated
incorrectly. This size ends up being send to the firmware,
causing unexpected behaviour in some cases.
Fixes: e17d8432fb67 ("nfp: flower: compile match for IPv6 tunnels") Reported-by: Chaoyong He <chaoyong.he@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20221007092132.218386-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Marek Behún [Fri, 7 Oct 2022 08:48:44 +0000 (10:48 +0200)]
net: sfp: fill also 5gbase-r and 25gbase-r modes in sfp_parse_support()
Fill in also 5gbase-r and 25gbase-r PHY interface modes into the
phy_interface_t bitmap in sfp_parse_support().
Fixes: a4f028fe5324 ("net: sfp: augment SFP parsing with phy_interface_t bitmap") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20221007084844.20352-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: systemport: Enable all RX descriptors for SYSTEMPORT Lite
The original commit that added support for the SYSTEMPORT Lite variant
halved the number of RX descriptors due to a confusion between the
number of descriptors and the number of descriptor words. There are 512
descriptor *words* which means 256 descriptors total.
Yang Yingliang [Mon, 10 Oct 2022 03:39:45 +0000 (11:39 +0800)]
octeontx2-pf: mcs: fix possible memory leak in otx2_probe()
In error path after calling cn10k_mcs_init(), cn10k_mcs_free() need
be called to avoid memory leak.
Fixes: a500032f7534 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 0fab2327a086 ("ptp: ocp: Add second GNSS device") Signed-off-by: Vadim Fedorenko <vadfed@fb.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sun, 9 Oct 2022 01:51:26 +0000 (09:51 +0800)]
octeontx2-af: cn10k: mcs: Fix error return code in mcs_register_interrupts()
If alloc_mem() fails in mcs_register_interrupts(), it should return error
code.
Fixes: 161038f70d15 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sat, 8 Oct 2022 08:39:42 +0000 (16:39 +0800)]
net: dsa: fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create()
Fix wrong pointer passed to PTR_ERR() in dsa_port_phylink_create() to print
error message.
Fixes: 4c0e05a15acc ("net: dsa: don't leave dangling pointers in dp->pl when failing") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Sat, 8 Oct 2022 08:26:50 +0000 (16:26 +0800)]
octeontx2-pf: mcs: fix missing unlock in some error paths
Add the missing unlock in some error paths.
Fixes: a500032f7534 ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 7 Oct 2022 22:57:43 +0000 (15:57 -0700)]
macvlan: enforce a consistent minimal mtu
macvlan should enforce a minimal mtu of 68, even at link creation.
This patch avoids the current behavior (which could lead to crashes
in ipv6 stack if the link is brought up)
$ ip link add macvlan1 link eno1 mtu 8 type macvlan # This should fail !
$ ip link sh dev macvlan1
5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop
state DOWN mode DEFAULT group default qlen 1000
link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff
$ ip link set macvlan1 mtu 67
Error: mtu less than device minimum.
$ ip link set macvlan1 mtu 68
$ ip link set macvlan1 mtu 8
Error: mtu less than device minimum.
Fixes: b8d55c187b83 ("net: use core MTU range checking in core net infra") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Duoming Zhou [Sun, 9 Oct 2022 06:37:31 +0000 (14:37 +0800)]
mISDN: hfcpci: Fix use-after-free bug in hfcpci_softirq
The function hfcpci_softirq() is a timer handler. If it
is running, the timer_pending() will return 0 and the
del_timer_sync() in HFC_cleanup() will not be executed.
As a result, the use-after-free bug will happen. The
process is shown below:
The device is deallocated is position [1] and used in
position [2].
Fix by removing the "timer_pending" check in HFC_cleanup(),
which makes sure that the hfcpci_softirq() have finished
before the resource is deallocated.
Fixes: 158560b026f0 ("mISDN: fix possible use-after-free in HFC_cleanup()") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
wifi: nl80211: Split memcpy() of struct nl80211_wowlan_tcp_data_token flexible array
To work around a misbehavior of the compiler's ability to see into
composite flexible array structs (as detailed in the coming memcpy()
hardening series[1]), split the memcpy() of the header and the payload
so no false positive run-time overflow warning will be generated.
Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Wireless events will be sent on the appropriate channels in
wireless_send_event(). Different wireless events may have different
payload structure and size, so kernel uses **len** and **cmd** field
in struct __compat_iw_event as wireless event common LCP part, uses
**pointer** as a label to mark the position of remaining different part.
Yet the problem is that, **pointer** is a compat_caddr_t type, which may
be smaller than the relative structure at the same position. So during
wireless_send_event() tries to parse the wireless events payload, it may
trigger the memcpy() run-time destination buffer bounds checking when the
relative structure's data is copied to the position marked by **pointer**.
This patch solves it by introducing flexible-array field **ptr_bytes**,
to mark the position of the wireless events remaining part next to
LCP part. What's more, this patch also adds **ptr_len** variable in
wireless_send_event() to improve its maintainability.
Felix Fietkau [Fri, 7 Oct 2022 12:56:11 +0000 (14:56 +0200)]
wifi: cfg80211: fix ieee80211_data_to_8023_exthdr handling of small packets
STP topology change notification packets only have a payload of 7 bytes,
so they get dropped due to the skb->len < hdrlen + 8 check.
Fix this by removing the extra 8 from the skb->len check and checking the
return code on the skb_copy_bits calls.
Fixes: 9ce648171cb5 ("cfg80211: add function for 802.3 conversion with separate output buffer") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Alexander Wetzel [Tue, 20 Sep 2022 15:55:41 +0000 (17:55 +0200)]
wifi: mac80211: netdev compatible TX stop for iTXQ drivers
Properly handle TX stop for internal queues (iTXQs) within mac80211.
mac80211 must not stop netdev queues when using mac80211 iTXQs.
For these drivers the netdev interface is created with IFF_NO_QUEUE.
While netdev still drops frames for IFF_NO_QUEUE interfaces when we stop
the netdev queues, it also prints a warning when this happens:
Assuming the mac80211 interface is called wlan0 we would get
"Virtual device wlan0 asks to queue packet!" when netdev has to drop a
frame.
This patch is keeping the harmless netdev queue starts for iTXQ drivers.
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Dan Carpenter [Mon, 12 Sep 2022 15:07:16 +0000 (18:07 +0300)]
wifi: mac80211: unlock on error in ieee80211_can_powered_addr_change()
Unlock before returning -EOPNOTSUPP.
Fixes: f984bb19d594 ("wifi: mac80211: Support POWERED_ADDR_CHANGE feature") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
James Prestwood [Thu, 15 Sep 2022 19:55:53 +0000 (12:55 -0700)]
wifi: mac80211: remove/avoid misleading prints
At some point a few kernel debug prints started appearing which
indicated something was sending invalid IEs:
"bad VHT capabilities, disabling VHT"
"Invalid HE elem, Disable HE"
Turns out these were being printed because the local hardware
supported HE/VHT but the peer/AP did not. Bad/invalid indicates,
to me at least, that the IE is in some way malformed, not missing.
For the HE print (ieee80211_verify_peer_he_mcs_support) it will
now silently fail if the HE capability element is missing (still
prints if the element size is wrong).
For the VHT print, it has been removed completely and will silently
set the DISABLE_VHT flag which is consistent with how DISABLE_HT
is set.
Signed-off-by: James Prestwood <prestwoj@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
James Prestwood [Wed, 28 Sep 2022 22:49:10 +0000 (15:49 -0700)]
wifi: mac80211: fix probe req HE capabilities access
When building the probe request IEs HE support is checked for
the 6GHz band (wiphy->bands[NL80211_BAND_6GHZ]). If supported
the HE capability IE should be included according to the spec.
The problem is the 16-bit capability is obtained from the
band object (sband) that was passed in, not the 6GHz band
object (sband6). If the sband object doesn't support HE it will
result in a warning.
Fixes: 8bd862747463 ("mac80211: always include HE 6GHz capability in probe request") Signed-off-by: James Prestwood <prestwoj@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Fri, 7 Oct 2022 09:05:09 +0000 (11:05 +0200)]
wifi: mac80211: do not drop packets smaller than the LLC-SNAP header on fast-rx
Since STP TCN frames are only 7 bytes, the pskb_may_pull call returns an error.
Instead of dropping those packets, bump them back to the slow path for proper
processing.
Fixes: 0106f1ab8251 ("mac80211: add fast-rx path") Reported-by: Chad Monroe <chad.monroe@smartrg.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
GCC-12 emits false positive -Warray-bounds warnings with
CONFIG_UBSAN_SHIFT (-fsanitize=shift). This is fixed in GCC 13[1],
and there is top-level Makefile logic to remove -Warray-bounds for
known-bad GCC versions staring with commit 3c7e0bf4178e ("gcc-12: disable
'-Warray-bounds' universally for now").
GCC-12 emits false positive -Warray-bounds warnings with
CONFIG_UBSAN_SHIFT (-fsanitize=shift). This is fixed in GCC 13[1],
and there is top-level Makefile logic to remove -Warray-bounds for
known-bad GCC versions staring with commit 3c7e0bf4178e ("gcc-12: disable
'-Warray-bounds' universally for now").
Serhiy Boiko [Thu, 6 Oct 2022 19:04:09 +0000 (22:04 +0300)]
prestera: matchall: do not rollback if rule exists
If you try to create a 'mirror' ACL rule on a port that already has a
mirror rule, prestera_span_rule_add() will fail with EEXIST error.
This forces rollback procedure which destroys existing mirror rule on
hardware leaving it visible in linux.
Add an explicit check for EEXIST to prevent the deletion of the existing
rule but keep user seeing error message:
$ tc filter add dev sw1p1 ... skip_sw action mirred egress mirror dev sw1p2
$ tc filter add dev sw1p1 ... skip_sw action mirred egress mirror dev sw1p3
RTNETLINK answers: File exists
We have an error talking to the kernel
Fixes: 6f80ce629036 ("net: marvell: prestera: Add matchall support") Signed-off-by: Serhiy Boiko <serhiy.boiko@plvision.eu> Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 6 Oct 2022 16:48:49 +0000 (10:48 -0600)]
ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference
Gwangun Jung reported a slab-out-of-bounds access in fib_nh_match:
fib_nh_match+0xf98/0x1130 linux-6.0-rc7/net/ipv4/fib_semantics.c:961
fib_table_delete+0x5f3/0xa40 linux-6.0-rc7/net/ipv4/fib_trie.c:1753
inet_rtm_delroute+0x2b3/0x380 linux-6.0-rc7/net/ipv4/fib_frontend.c:874
Separate nexthop objects are mutually exclusive with the legacy
multipath spec. Fix fib_nh_match to return if the config for the
to be deleted route contains a multipath spec while the fib_info
is using a nexthop object.
Fixes: 1d8053328272 ("ipv4: Allow routes to use nexthop objects") Fixes: 68d57d6450d3 ("net: ipv4: fix route with nexthop object delete warning") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Thu, 6 Oct 2022 12:01:36 +0000 (20:01 +0800)]
net: enetc: Remove duplicated include in enetc_qos.c
net/pkt_sched.h is included twice in enetc_qos.c,
remove one of them.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2334 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Li [Thu, 6 Oct 2022 11:44:00 +0000 (19:44 +0800)]
octeontx2-pf: mcs: remove unneeded semicolon
Semicolon is not required after curly braces.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2332 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gaurav Kohli [Thu, 6 Oct 2022 05:52:59 +0000 (22:52 -0700)]
hv_netvsc: Fix race between VF offering and VF association message from host
During vm boot, there might be possibility that vf registration
call comes before the vf association from host to vm.
And this might break netvsc vf path, To prevent the same block
vf registration until vf bind message comes from host.
Cc: stable@vger.kernel.org Fixes: 86bec1e3179c5 ("hv_netvsc: pair VF based on serial number") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Thu, 6 Oct 2022 02:02:37 +0000 (22:02 -0400)]
net: ieee802154: return -EINVAL for unknown addr type
This patch adds handling to return -EINVAL for an unknown addr type. The
current behaviour is to return 0 as successful but the size of an
unknown addr type is not defined and should return an error like -EINVAL.
Fixes: 3b8d22d8a4dc ("net/ieee802154: fix uninit value bug in dgram_sendmsg") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wenjia Zhang [Fri, 7 Oct 2022 06:54:36 +0000 (08:54 +0200)]
MAINTAINERS: add Jan as SMC maintainer
Add Jan as maintainer for Shared Memory Communications (SMC)
Sockets.
Acked-by: Jan Karcher <jaka@linux.ibm.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: pse-pd: PSE_REGULATOR should depend on REGULATOR
The Regulator based PSE controller driver relies on regulator support to
be enabled. If regulator support is disabled, it will still compile
fine, but won't operate correctly.
Hence add a dependency on REGULATOR, to prevent asking the user about
this driver when configuring a kernel without regulator support.
Vladimir Oltean [Tue, 4 Oct 2022 22:01:00 +0000 (01:01 +0300)]
Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"
taprio_attach() has this logic at the end, which should have been
removed with the blamed patch (which is now being reverted):
/* access to the child qdiscs is not needed in offload mode */
if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
kfree(q->qdiscs);
q->qdiscs = NULL;
}
because otherwise, we make use of q->qdiscs[] even after this array was
deallocated, namely in taprio_leaf(). Therefore, whenever one would try
to attach a valid child qdisc to a fully offloaded taprio root, one
would immediately dereference a NULL pointer.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
pc : taprio_leaf+0x28/0x40
lr : qdisc_leaf+0x3c/0x60
Call trace:
taprio_leaf+0x28/0x40
tc_modify_qdisc+0xf0/0x72c
rtnetlink_rcv_msg+0x12c/0x390
netlink_rcv_skb+0x5c/0x130
rtnetlink_rcv+0x1c/0x2c
The solution is not as obvious as the problem. The code which deallocates
q->qdiscs[] is in fact copied and pasted from mqprio, which also
deallocates the array in mqprio_attach() and never uses it afterwards.
Therefore, the identical cleanup logic of priv->qdiscs[] that
mqprio_destroy() has is deceptive because it will never take place at
qdisc_destroy() time, but just at raw ops->destroy() time (otherwise
said, priv->qdiscs[] do not last for the entire lifetime of the mqprio
root), but rather, this is just the twisted way in which the Qdisc API
understands error path cleanup should be done (Qdisc_ops :: destroy() is
called even when Qdisc_ops :: init() never succeeded).
Side note, in fact this is also what the comment in mqprio_init() says:
/* pre-allocate qdisc, attachment can't fail */
Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as
data passing between Qdisc_ops :: init() and Qdisc_ops :: attach().
[ this comment was also copied and pasted into the initial taprio
commit, even though taprio_attach() came way later ]
The problem is that taprio also makes extensive use of the q->qdiscs[]
array in the software fast path (taprio_enqueue() and taprio_dequeue()),
but it does not keep a reference of its own on q->qdiscs[i] (you'd think
that since it creates these Qdiscs, it holds the reference, but nope,
this is not completely true).
To understand the difference between taprio_destroy() and mqprio_destroy()
one must look before commit 8f05635a203c ("net: taprio offload: enforce
qdisc to netdev queue mapping"), because that just muddied the waters.
In the "original" taprio design, taprio always attached itself (the root
Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go
through taprio_enqueue().
It also called qdisc_refcount_inc() on itself for as many times as there
were netdev TX queues, in order to counter-balance what tc_get_qdisc()
does when destroying a Qdisc (simplified for brevity below):
if (n->nlmsg_type == RTM_DELQDISC)
err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack);
qdisc_graft(where "new" is NULL so this deletes the Qdisc):
for (i = 0; i < num_q; i++) {
struct netdev_queue *dev_queue;
dev_queue = netdev_get_tx_queue(dev, i);
old = dev_graft_qdisc(dev_queue, new);
if (new && i > 0)
qdisc_refcount_inc(new);
qdisc_put(old);
~~~~~~~~~~~~~~
this decrements taprio's refcount once for each TX queue
}
notify_and_destroy(net, skb, n, classid,
rtnl_dereference(dev->qdisc), new);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and this finally decrements it to zero,
making qdisc_put() call qdisc_destroy()
The q->qdiscs[] created using qdisc_create_dflt() (or their
replacements, if taprio_graft() was ever to get called) were then
privately freed by taprio_destroy().
This is still what is happening after commit 8f05635a203c ("net: taprio
offload: enforce qdisc to netdev queue mapping"), but only for software
mode.
In full offload mode, the per-txq "qdisc_put(old)" calls from
qdisc_graft() now deallocate the child Qdiscs rather than decrement
taprio's refcount. So when notify_and_destroy(taprio) finally calls
taprio_destroy(), the difference is that the child Qdiscs were already
deallocated.
And this is exactly why the taprio_attach() comment "access to the child
qdiscs is not needed in offload mode" is deceptive too. Not only the
q->qdiscs[] array is not needed, but it is also necessary to get rid of
it as soon as possible, because otherwise, we will also call qdisc_put()
on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this
will cause a nasty use-after-free/refcount-saturate/whatever.
In short, the problem is that since the blamed commit, taprio_leaf()
needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy()
-> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach()
for full offload. Fixing one problem triggers the other.
All of this can be solved by making taprio keep its q->qdiscs[i] with a
refcount elevated at 2 (in offloaded mode where they are attached to the
netdev TX queues), both in taprio_attach() and in taprio_graft(). The
generic qdisc_graft() would just decrement the child qdiscs' refcounts
to 1, and taprio_destroy() would give them the final coup de grace.
However the rabbit hole of changes is getting quite deep, and the
complexity increases. The blamed commit was supposed to be a bug fix in
the first place, and the bug it addressed is not so significant so as to
justify further rework in stable trees. So I'd rather just revert it.
I don't know enough about multi-queue Qdisc design to make a proper
judgement right now regarding what is/isn't idiomatic use of Qdisc
concepts in taprio. I will try to study the problem more and come with a
different solution in net-next.
Fixes: 8e2333f89969 ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs") Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
syzbot is hitting skb_assert_len() warning at __dev_queue_xmit() [1],
for PF_IEEE802154 socket's zero-sized raw_sendmsg() request is hitting
__dev_queue_xmit() with skb->len == 0.
Since PF_IEEE802154 socket's zero-sized raw_sendmsg() request was
able to return 0, don't call __dev_queue_xmit() if packet length is 0.
Note that this might be a sign that commit fe1f468a3ebcab50 ("bpf: Don't
redirect packets with invalid pkt_len") should be reverted, for
skb->len == 0 was acceptable for at least PF_IEEE802154 socket.
Linus Torvalds [Tue, 4 Oct 2022 20:38:03 +0000 (13:38 -0700)]
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO. This
significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions. Expose
crash_kexec() to allow BPF to trigger a kernel dump. Use
CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs. Only
structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols:
- WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
(MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API:
- Allow selecting the conduit interface used by each port in DSA
switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one of
the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much as
possible. It's one of those magic knobs which seemed like a good
idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers:
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers:
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay window
offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
Linus Torvalds [Tue, 4 Oct 2022 18:13:38 +0000 (11:13 -0700)]
Merge tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"Improve user help for Landlock (documentation and sample)"
* tag 'landlock-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Fix documentation style
landlock: Slightly improve documentation and fix spelling
samples/landlock: Print hints about ABI versions
Linus Torvalds [Tue, 4 Oct 2022 18:05:43 +0000 (11:05 -0700)]
Merge tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Six audit patches for v6.1, most are pretty trivial, but a quick list
of the highlights are below:
- Only free the audit proctitle information on task exit. This allows
us to cache the information and improve performance slightly.
- Use the time_after() macro to do time comparisons instead of doing
it directly and potentially causing ourselves problems when the
timer wraps.
- Convert an audit_context state comparison from a relative enum
comparison, e.g. (x < y), to a not-equal comparison to ensure that
we are not caught out at some unknown point in the future by an
enum shuffle.
- A handful of small cleanups such as tidying up comments and
removing unused declarations"
* tag 'audit-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove selinux_audit_rule_update() declaration
audit: use time_after to compare time
audit: free audit_proctitle only on task exit
audit: explicitly check audit_context->context enum value
audit: audit_context pid unused, context enum comment fix
audit: fix repeated words in comments
Linus Torvalds [Tue, 4 Oct 2022 17:24:11 +0000 (10:24 -0700)]
Merge tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- The usual round of smaller fixes and cleanups all over the tree
* tag 'x86_cleanups_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Include the header of init_ia32_feat_ctl()'s prototype
x86/uaccess: Improve __try_cmpxchg64_user_asm() for x86_32
x86: Fix various duplicate-word comment typos
x86/boot: Remove superfluous type casting from arch/x86/boot/bitops.h
Linus Torvalds [Tue, 4 Oct 2022 17:14:58 +0000 (10:14 -0700)]
Merge tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control updates from Borislav Petkov:
- More work by James Morse to disentangle the resctrl filesystem
generic code from the architectural one with the endgoal of plugging
ARM's MPAM implementation into it too so that the user interface
remains the same
- Properly restore the MSR_MISC_FEATURE_CONTROL value instead of
blindly overwriting it to 0
* tag 'x86_cache_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes
x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data
x86/resctrl: Rename and change the units of resctrl_cqm_threshold
x86/resctrl: Move get_corrected_mbm_count() into resctrl_arch_rmid_read()
x86/resctrl: Move mbm_overflow_count() into resctrl_arch_rmid_read()
x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read()
x86/resctrl: Abstract __rmid_read()
x86/resctrl: Allow per-rmid arch private storage to be reset
x86/resctrl: Add per-rmid arch private storage for overflow and chunks
x86/resctrl: Calculate bandwidth from the previous __mon_event_count() chunks
x86/resctrl: Allow update_mba_bw() to update controls directly
x86/resctrl: Remove architecture copy of mbps_val
x86/resctrl: Switch over to the resctrl mbps_val list
x86/resctrl: Create mba_sc configuration in the rdt_domain
x86/resctrl: Abstract and use supports_mba_mbps()
x86/resctrl: Remove set_mba_sc()s control array re-initialisation
x86/resctrl: Add domain offline callback for resctrl work
x86/resctrl: Group struct rdt_hw_domain cleanup
x86/resctrl: Add domain online callback for resctrl work
x86/resctrl: Merge mon_capable and mon_enabled
...
Linus Torvalds [Tue, 4 Oct 2022 17:12:08 +0000 (10:12 -0700)]
Merge tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x75 microcode loader updates from Borislav Petkov:
- Get rid of a single ksize() usage
- By popular demand, print the previous microcode revision an update
was done over
- Remove more code related to the now gone MICROCODE_OLD_INTERFACE
- Document the problems stemming from microcode late loading
* tag 'x86_microcode_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Track patch allocation size explicitly
x86/microcode: Print previous version of microcode after reload
x86/microcode: Remove ->request_microcode_user()
x86/microcode: Document the whole late loading problem
Linus Torvalds [Tue, 4 Oct 2022 17:00:27 +0000 (10:00 -0700)]
Merge tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Borislav Petkov:
- Drop misleading "RIP" from the opcodes dumping message
- Correct APM entry's Konfig help text
* tag 'x86_misc_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/dumpstack: Don't mention RIP in "Code: "
x86/Kconfig: Specify idle=poll instead of no-hlt
Linus Torvalds [Tue, 4 Oct 2022 16:49:50 +0000 (09:49 -0700)]
Merge tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm update from Borislav Petkov:
- Use the __builtin_ffs/ctzl() compiler builtins for the constant
argument case in the kernel's optimized ffs()/ffz() helpers in order
to make use of the compiler's constant folding optmization passes.
* tag 'x86_asm_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions
x86/asm/bitops: Use __builtin_ffs() to evaluate constant expressions
Linus Torvalds [Tue, 4 Oct 2022 16:46:22 +0000 (09:46 -0700)]
Merge tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core fixes from Borislav Petkov:
- Make sure an INT3 is slapped after every unconditional retpoline JMP
as both vendors suggest
- Clean up pciserial a bit
* tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86,retpoline: Be sure to emit INT3 after JMP *%\reg
x86/earlyprintk: Clean up pciserial
Linus Torvalds [Tue, 4 Oct 2022 16:33:12 +0000 (09:33 -0700)]
Merge tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Fix the APEI MCE callback handler to consult the hardware about the
granularity of the memory error instead of hard-coding it
- Offline memory pages on Intel machines after 2 errors reported per
page
* tag 'ras_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Retrieve poison range from hardware
RAS/CEC: Reduce offline page threshold for Intel systems
Linus Torvalds [Tue, 4 Oct 2022 16:21:30 +0000 (09:21 -0700)]
Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov:
- Print the CPU number at segfault time.
The number printed is not always accurate (preemption is enabled at
that time) but the print string contains "likely" and after a lot of
back'n'forth on this, this was the consensus that was reached. See
thread at [1].
- After a *lot* of testing and polishing, finally the clear_user()
improvements to inline REP; STOSB by default
Linus Torvalds [Tue, 4 Oct 2022 16:13:21 +0000 (09:13 -0700)]
Merge tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RTC cleanups from Borislav Petkov:
- Cleanup x86/rtc.c and delete duplicated functionality in favor of
using the respective functionality from the RTC library
* tag 'x86_timers_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/rtc: Rename mach_set_rtc_mmss() to mach_set_cmos_time()
x86/rtc: Rewrite & simplify mach_get_cmos_time() by deleting duplicated functionality
Linus Torvalds [Tue, 4 Oct 2022 15:58:02 +0000 (08:58 -0700)]
Merge tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
- Add support for Skylake-S CPUs to ie31200_edac
- Improve error decoding speed of the Intel drivers by avoiding the
ACPI facilities but doing decoding in the driver itself
- Other misc improvements to the Intel drivers
- The usual cleanups and fixlets all over EDAC land
* tag 'edac_updates_for_v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/i7300: Correct the i7300_exit() function name in comment
x86/sb_edac: Add row column translation for Broadwell
EDAC/i10nm: Print an extra register set of retry_rd_err_log
EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
EDAC/skx_common: Add ChipSelect ADXL component
EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
EDAC: Remove obsolete declarations in edac_module.h
EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
EDAC/skx_common: Make output format similar
EDAC/skx_common: Use driver decoder first
EDAC/mc: Drop duplicated dimm->nr_pages debug printout
EDAC/mc: Replace spaces with tabs in memtype flags definition
EDAC/wq: Remove unneeded flush_workqueue()
EDAC/ie31200: Add Skylake-S support
Borislav Petkov [Tue, 4 Oct 2022 08:00:25 +0000 (10:00 +0200)]
Merge branches 'edac-drivers' and 'edac-misc' into edac-updates-for-v6.1
Combine all queued EDAC changes for submission into v6.1:
* edac-drivers:
EDAC/ie31200: Add Skylake-S support
* edac-misc:
EDAC/i7300: Correct the i7300_exit() function name in comment
x86/sb_edac: Add row column translation for Broadwell
EDAC/i10nm: Print an extra register set of retry_rd_err_log
EDAC/i10nm: Retrieve and print retry_rd_err_log registers for HBM
EDAC/skx_common: Add ChipSelect ADXL component
EDAC/ppc_4xx: Reorder symbols to get rid of a few forward declarations
EDAC: Remove obsolete declarations in edac_module.h
EDAC/i10nm: Add driver decoder for Ice Lake and Tremont CPUs
EDAC/skx_common: Make output format similar
EDAC/skx_common: Use driver decoder first
EDAC/mc: Drop duplicated dimm->nr_pages debug printout
EDAC/mc: Replace spaces with tabs in memtype flags definition
EDAC/wq: Remove unneeded flush_workqueue()
Linus Torvalds [Tue, 4 Oct 2022 03:33:41 +0000 (20:33 -0700)]
Merge tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull STATX_DIOALIGN support from Eric Biggers:
"Make statx() support reporting direct I/O (DIO) alignment information.
This provides a generic interface for userspace programs to determine
whether a file supports DIO, and if so with what alignment
restrictions. Specifically, STATX_DIOALIGN works on block devices, and
on regular files when their containing filesystem has implemented
support.
An interface like this has been requested for years, since the
conditions for when DIO is supported in Linux have gotten increasingly
complex over time. Today, DIO support and alignment requirements can
be affected by various filesystem features such as multi-device
support, data journalling, inline data, encryption, verity,
compression, checkpoint disabling, log-structured mode, etc.
Further complicating things, Linux v6.0 relaxed the traditional rule
of DIO needing to be aligned to the block device's logical block size;
now user buffers (but not file offsets) only need to be aligned to the
DMA alignment.
The approach of uplifting the XFS specific ioctl XFS_IOC_DIOINFO was
discarded in favor of creating a clean new interface with statx().
For more information, see the individual commits and the man page
update[1]"
Link: https://lore.kernel.org/r/20220722074229.148925-1-ebiggers@kernel.org
* tag 'statx-dioalign-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
xfs: support STATX_DIOALIGN
f2fs: support STATX_DIOALIGN
f2fs: simplify f2fs_force_buffered_io()
f2fs: move f2fs_force_buffered_io() into file.c
ext4: support STATX_DIOALIGN
fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN
vfs: support STATX_DIOALIGN on block devices
statx: add direct I/O alignment information
Linus Torvalds [Tue, 4 Oct 2022 03:27:34 +0000 (20:27 -0700)]
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
"Minor changes to convert uses of kmap() to kmap_local_page()"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: use kmap_local_page() instead of kmap()
fs-verity: use memcpy_from_page()
Linus Torvalds [Tue, 4 Oct 2022 03:18:34 +0000 (20:18 -0700)]
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
"This release contains some implementation changes, but no new
features:
- Rework the implementation of the fscrypt filesystem-level keyring
to not be as tightly coupled to the keyrings subsystem. This
resolves several issues.
- Eliminate most direct uses of struct request_queue from fs/crypto/,
since struct request_queue is considered to be a block layer
implementation detail.
- Stop using the PG_error flag to track decryption failures. This is
a prerequisite for freeing up PG_error for other uses"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: work on block_devices instead of request_queues
fscrypt: stop holding extra request_queue references
fscrypt: stop using keyrings subsystem for fscrypt_master_key
fscrypt: stop using PG_error to track error status
fscrypt: remove fscrypt_set_test_dummy_encryption()
Linus Torvalds [Tue, 4 Oct 2022 03:11:59 +0000 (20:11 -0700)]
Merge tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- Fix a couple races found with a new torture test
- Improve errors when api functions are used incorrectly
- Improve tracing for lock requests from user space
- Fix use after free in recently added tracing cod.
- Small internal code cleanups
* tag 'dlm-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
fs: dlm: fix possible use after free if tracing
fs: dlm: const void resource name parameter
fs: dlm: LSFL_CB_DELAY only for kernel lockspaces
fs: dlm: remove DLM_LSFL_FS from uapi
fs: dlm: trace user space callbacks
fs: dlm: change ls_clear_proc_locks to spinlock
fs: dlm: remove dlm_del_ast prototype
fs: dlm: handle rcom in else if branch
fs: dlm: allow lockspaces have zero lvblen
fs: dlm: fix invalid derefence of sb_lvbptr
fs: dlm: handle -EINVAL as log_error()
fs: dlm: use __func__ for function name
fs: dlm: handle -EBUSY first in unlock validation
fs: dlm: handle -EBUSY first in lock arg validation
fs: dlm: fix race between test_bit() and queue_work()
fs: dlm: fix race in lowcomms
Linus Torvalds [Tue, 4 Oct 2022 03:07:15 +0000 (20:07 -0700)]
Merge tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This release is mostly bug fixes, clean-ups, and optimizations.
One notable set of fixes addresses a subtle buffer overflow issue that
occurs if a small RPC Call message arrives in an oversized RPC record.
This is only possible on a framed RPC transport such as TCP.
Because NFSD shares the receive and send buffers in one set of pages,
an oversized RPC record steals pages from the send buffer that will be
used to construct the RPC Reply message. NFSD must not assume that a
full-sized buffer is always available to it; otherwise, it will walk
off the end of the send buffer while constructing its reply.
In this release, we also introduce the ability for the server to wait
a moment for clients to return delegations before it responds with
NFS4ERR_DELAY. This saves a retransmit and a network round- trip when
a delegation recall is needed. This work will be built upon in future
releases.
The NFS server adds another shrinker to its collection. Because
courtesy clients can linger for quite some time, they might be
freeable when the server host comes under memory pressure. A new
shrinker has been added that releases courtesy client resources during
low memory scenarios.
Lastly, of note: the maximum number of operations per NFSv4 COMPOUND
that NFSD can handle is increased from 16 to 50. There are NFSv4
client implementations that need more than 16 to successfully perform
a mount operation that uses a pathname with many components"
* tag 'nfsd-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (53 commits)
nfsd: extra checks when freeing delegation stateids
nfsd: make nfsd4_run_cb a bool return function
nfsd: fix comments about spinlock handling with delegations
nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
NFSD: fix use-after-free on source server when doing inter-server copy
NFSD: Cap rsize_bop result based on send buffer size
NFSD: Rename the fields in copy_stateid_t
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
NFSD: Pack struct nfsd4_compoundres
NFSD: Remove unused nfsd4_compoundargs::cachetype field
NFSD: Remove "inline" directives on op_rsize_bop helpers
NFSD: Clean up nfs4svc_encode_compoundres()
SUNRPC: Fix typo in xdr_buf_subsegment's kdoc comment
NFSD: Clean up WRITE arg decoders
NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
NFSD: Refactor common code out of dirlist helpers
...
Linus Torvalds [Tue, 4 Oct 2022 03:01:40 +0000 (20:01 -0700)]
Merge tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, for container use cases, fscache-based shared domain is
introduced [1] so that data blobs in the same domain will be storage
deduplicated and it will also be used for page cache sharing later.
Also, a special packed inode is now introduced to record inode
fragments which keep the tail part of files by Yue Hu [2]. You can
keep arbitary length or (at will) the whole file as a fragment and
then fragments can be optionally compressed in the packed inode
together and even deduplicated for smaller image sizes.
In addition to that, global compressed data deduplication by sharing
partial-referenced pclusters is also supported in this cycle.
Summary:
- Introduce fscache-based domain to share blobs between images
- Support recording fragments in a special packed inode
- Support partial-referenced pclusters for global compressed data
deduplication
- Fix an order >= MAX_ORDER warning due to crafted negative i_size
- Several cleanups"
Link: https://lore.kernel.org/r/20220916085940.89392-1-zhujia.zj@bytedance.com Link: https://lore.kernel.org/r/cover.1663065968.git.huyue2@coolpad.com
* tag 'erofs-for-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: clean up erofs_iget()
erofs: clean up unnecessary code and comments
erofs: fold in z_erofs_reload_indexes()
erofs: introduce partial-referenced pclusters
erofs: support on-disk compressed fragments data
erofs: support interlaced uncompressed data for compressed files
erofs: clean up .read_folio() and .readahead() in fscache mode
erofs: introduce 'domain_id' mount option
erofs: Support sharing cookies in the same domain
erofs: introduce a pseudo mnt to manage shared cookies
erofs: introduce fscache-based domain
erofs: code clean up for fscache
erofs: use kill_anon_super() to kill super in fscache mode
erofs: fix order >= MAX_ORDER warning due to crafted negative i_size
Linus Torvalds [Tue, 4 Oct 2022 02:54:29 +0000 (19:54 -0700)]
Merge tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull fatfs vfsuid conversion from Christian Brauner:
"Last cycle we introduced the new vfs{g,u}id_t types that we had agreed
on. The most important parts of the vfs have been converted but there
are a few more places we need to switch before we can remove the old
helpers completely.
This cycle we converted all filesystems that called idmapped mount
helpers directly. The affected filesystems are f2fs, fat, fuse, ksmbd,
overlayfs, and xfs. We've sent patches for all of them. Looking at
-next f2fs, ksmbd, overlayfs, and xfs have all picked up these patches
and they should land in mainline during the v6.1 merge window.
So all filesystems that have a separate tree should send the vfsuid
conversion themselves. Onle the fat conversion is going through this
generic fs trees because there is no fat tree.
In order to change time settings on an inode fat checks that the
caller either is the owner of the inode or the inode's group is in the
caller's group list. If fat is on an idmapped mount we compare whether
the inode mapped into the mount is equivalent to the caller's fsuid.
If it isn't we compare whether the inode's group mapped into the mount
is in the caller's group list.
We now use the new vfsuid based helpers for that"
* tag 'fs.vfsuid.fat.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fat: port to vfs{g,u}id_t and associated helpers
Linus Torvalds [Tue, 4 Oct 2022 02:48:54 +0000 (19:48 -0700)]
Merge tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs acl updates from Christian Brauner:
"These are general fixes and preparatory changes related to the ongoing
posix acl rework. The actual rework where we build a type safe posix
acl api wasn't ready for this merge window but we're hopeful for the
next merge window.
General fixes:
- Some filesystems like 9p and cifs have to implement custom posix
acl handlers because they require access to the dentry in order to
set and get posix acls while the set and get inode operations
currently don't. But the ntfs3 filesystem has no such requirement
and thus implemented custom posix acl xattr handlers when it really
didn't have to. So this pr contains patch that just implements set
and get inode operations for ntfs3 and switches it to rely on the
generic posix acl xattr handlers. (We would've appreciated reviews
from the ntfs3 maintainers but we didn't get any. But hey, if we
really broke it we'll fix it. But fstests for ntfs3 said it's
fine.)
- The posix_acl_fix_xattr_common() helper has been adapted so it can
be used by a few more callers and avoiding open-coding the same
checks over and over.
Other than the two general fixes this series introduces a new helper
vfs_set_acl_prepare(). The reason for this helper is so that we can
mitigate one of the source that change {g,u}id values directly in the
uapi struct. With the vfs_set_acl_prepare() helper we can move the
idmapped mount fixup into the generic posix acl set handler.
The advantage of this is that it allows us to remove the
posix_acl_setxattr_idmapped_mnt() helper which so far we had to call
in vfs_setxattr() to account for idmapped mounts. While semantically
correct the problem with this approach was that we had to keep the
value parameter of the generic vfs_setxattr() call as non-const. This
is rectified in this series.
Ultimately, we will get rid of all the extreme kludges and type
unsafety once we have merged the posix api - hopefully during the next
merge window - built solely around get and set inode operations. Which
incidentally will also improve handling of posix acls in security and
especially in integrity modesl. While this will come with temporarily
having two inode operation for posix acls that is nothing compared to
the problems we have right now and so well worth it. We'll end up with
something that we can actually reason about instead of needing to
write novels to explain what's going on"
* tag 'fs.acl.rework.prep.v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
xattr: always us is_posix_acl_xattr() helper
acl: fix the comments of posix_acl_xattr_set
xattr: constify value argument in vfs_setxattr()
ovl: use vfs_set_acl_prepare()
acl: move idmapping handling into posix_acl_xattr_set()
acl: add vfs_set_acl_prepare()
acl: return EOPNOTSUPP in posix_acl_fix_xattr_common()
ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers
Linus Torvalds [Tue, 4 Oct 2022 00:51:52 +0000 (17:51 -0700)]
Merge tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
"Seven patches for the LSM layer and we've got a mix of trivial and
significant patches. Highlights below, starting with the smaller bits
first so they don't get lost in the discussion of the larger items:
- Remove some redundant NULL pointer checks in the common LSM audit
code.
- Ratelimit the lockdown LSM's access denial messages.
With this change there is a chance that the last visible lockdown
message on the console is outdated/old, but it does help preserve
the initial series of lockdown denials that started the denial
message flood and my gut feeling is that these might be the more
valuable messages.
- Open userfaultfds as readonly instead of read/write.
While this code obviously lives outside the LSM, it does have a
noticeable impact on the LSMs with Ondrej explaining the situation
in the commit description. It is worth noting that this patch
languished on the VFS list for over a year without any comments
(objections or otherwise) so I took the liberty of pulling it into
the LSM tree after giving fair notice. It has been in linux-next
since the end of August without any noticeable problems.
- Add a LSM hook for user namespace creation, with implementations
for both the BPF LSM and SELinux.
Even though the changes are fairly small, this is the bulk of the
diffstat as we are also including BPF LSM selftests for the new
hook.
It's also the most contentious of the changes in this pull request
with Eric Biederman NACK'ing the LSM hook multiple times during its
development and discussion upstream. While I've never taken NACK's
lightly, I'm sending these patches to you because it is my belief
that they are of good quality, satisfy a long-standing need of
users and distros, and are in keeping with the existing nature of
the LSM layer and the Linux Kernel as a whole.
The patches in implement a LSM hook for user namespace creation
that allows for a granular approach, configurable at runtime, which
enables both monitoring and control of user namespaces. The general
consensus has been that this is far preferable to the other
solutions that have been adopted downstream including outright
removal from the kernel, disabling via system wide sysctls, or
various other out-of-tree mechanisms that users have been forced to
adopt since we haven't been able to provide them an upstream
solution for their requests. Eric has been steadfast in his
objections to this LSM hook, explaining that any restrictions on
the user namespace could have significant impact on userspace.
While there is the possibility of impacting userspace, it is
important to note that this solution only impacts userspace when it
is requested based on the runtime configuration supplied by the
distro/admin/user. Frederick (the pathset author), the LSM/security
community, and myself have tried to work with Eric during
development of this patchset to find a mutually acceptable
solution, but Eric's approach and unwillingness to engage in a
meaningful way have made this impossible. I have CC'd Eric directly
on this pull request so he has a chance to provide his side of the
story; there have been no objections outside of Eric's"
* tag 'lsm-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lockdown: ratelimit denial messages
userfaultfd: open userfaultfds with O_RDONLY
selinux: Implement userns_create hook
selftests/bpf: Add tests verifying bpf lsm userns_create hook
bpf-lsm: Make bpf_lsm_userns_create() sleepable
security, lsm: Introduce security_create_user_ns()
lsm: clean up redundant NULL pointer check
Linus Torvalds [Tue, 4 Oct 2022 00:45:15 +0000 (17:45 -0700)]
Merge tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"Six SELinux patches, all are simple and easily understood, but a list
of the highlights is below:
- Use 'grep -E' instead of 'egrep' in the SELinux policy install
script.
Fun fact, this seems to be GregKH's *second* dedicated SELinux
patch since we transitioned to git (ignoring merges, the SPDX
stuff, and a trivial fs reference removal when lustre was yanked);
the first was back in 2011 when selinuxfs was placed in
/sys/fs/selinux. Oh, the memories ...
- Convert the SELinux policy boolean values to use signed integer
types throughout the SELinux kernel code.
Prior to this we were using a mix of signed and unsigned integers
which was probably okay in this particular case, but it is
definitely not a good idea in general.
- Remove a reference to the SELinux runtime disable functionality in
/etc/selinux/config as we are in the process of deprecating that.
See [1] for more background on this if you missed the previous
notes on the deprecation.
- Minor cleanups: remove unneeded variables and function parameter
constification"
Link: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable
* tag 'selinux-pr-20221003' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: remove runtime disable message in the install_policy.sh script
selinux: use "grep -E" instead of "egrep"
selinux: remove the unneeded result variable
selinux: declare read-only parameters const
selinux: use int arrays for boolean values
selinux: remove an unneeded variable in sel_make_class_dir_entries()
Merge in the left-over fixes before the net-next pull-request.
Conflicts:
drivers/net/ethernet/mediatek/mtk_ppe.c e9553af106a9 ("net: ethernet: mtk_eth_soc: fix state in __mtk_foe_entry_clear") e17be8581378 ("net: ethernet: mtk_eth_soc: add foe_entry_size to mtk_eth_soc")
https://lore.kernel.org/all/6cb6893b-4921-a068-4c30-1109795110bb@tessares.net/
kernel/bpf/helpers.c 9d674861c77d ("bpf: Gate dynptr API behind CAP_BPF") 0b894e472a94 ("bpf: Move bpf_loop and bpf_for_each_map_elem under CAP_BPF") fecdb8943871 ("bpf: expose bpf_strtol and bpf_strtoul to all program types")
https://lore.kernel.org/all/20221003201957.13149-1-daniel@iogearbox.net/
Linus Torvalds [Tue, 4 Oct 2022 00:42:12 +0000 (17:42 -0700)]
Merge tag 'integrity-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Just two bug fixes"
* tag 'integrity-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
efi: Correct Macmini DMI match in uefi cert quirk
ima: fix blocking of security.ima xattrs of unsupported algorithms
Linus Torvalds [Tue, 4 Oct 2022 00:38:09 +0000 (17:38 -0700)]
Merge tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next
Pull smack updates from Casey Schaufler:
"Two minor code clean-ups: one removes constants left over from the old
mount API, while the other gets rid of an unneeded variable.
The other change fixes a flaw in handling IPv6 labeling"
* tag 'Smack-for-6.1' of https://github.com/cschaufler/smack-next:
smack: cleanup obsolete mount option flags
smack: lsm: remove the unneeded result variable
SMACK: Add sk_clone_security LSM hook
The _SLOW designation wasn't really descriptive of anything. This is
meant to be called from process context when it's possible to sleep. So
name this more aptly _SLEEPABLE, which better fits its intended use.
Fixes: 4021e1a2ce29 ("once: add DO_ONCE_SLOW() for sleepable contexts") Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221003181413.1221968-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Oleksij Rempel [Mon, 3 Oct 2022 06:52:00 +0000 (08:52 +0200)]
ethtool: add interface to interact with Ethernet Power Equipment
Add interface to support Power Sourcing Equipment. At current step it
provides generic way to address all variants of PSE devices as defined
in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4
PoDL Power Sourcing Equipment (PSE).
Currently supported and mandatory objects are:
IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus
IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState
IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl
This is minimal interface needed to control PSE on each separate
ethernet port but it provides not all mandatory objects specified in
IEEE 802.3-2018.
Since "PoDL PSE" and "PSE" have similar names, but some different values
I decide to not merge them and keep separate naming schema. This should
allow as to be as close to IEEE 802.3 spec as possible and avoid name
conflicts in the future.
This implementation is connected to PHYs instead of MACs because PSE
auto classification can potentially interfere with PHY auto negotiation.
So, may be some extra PHY related initialization will be needed.
With WIP version of ethtools interaction with PSE capable link looks
as following:
$ ip l
...
5: t1l1@eth0: <BROADCAST,MULTICAST> ..
...
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: disabled
PoDL PSE Power Detection Status: disabled
$ ethtool --set-pse t1l1 podl-pse-admin-control enable
$ ethtool --show-pse t1l1
PSE attributs for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power
Signed-off-by: kernel test robot <lkp@intel.com> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>