Kele Huang [Thu, 14 Oct 2021 03:19:52 +0000 (11:19 +0800)]
ptp: fix error print of ptp_kvm on X86_64 platform
Commit a86ed2cfa13c5 ("ptp: Don't print an error if ptp_kvm is not supported")
fixes the error message print on ARM platform by only concerning about
the case that the error returned from kvm_arch_ptp_init() is not -EOPNOTSUPP.
Although the ARM platform returns -EOPNOTSUPP if ptp_kvm is not supported
while X86_64 platform returns -KVM_EOPNOTSUPP, both error codes share the
same value 95.
Actually kvm_arch_ptp_init() on X86_64 platform can return three kinds of
errors (-KVM_ENOSYS, -KVM_EOPNOTSUPP and -KVM_EFAULT). The problem is that
-KVM_EOPNOTSUPP is masked out and -KVM_EFAULT is ignored among them.
This patch fixes this by returning them to ptp_kvm_init() respectively.
Signed-off-by: Kele Huang <huangkele@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
- ipv6: ioam: move the check for undefined bits to improve
interoperability"
* tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe
MAINTAINERS: Update the devicetree documentation path of imx fec driver
sctp: account stream padding length for reconf chunk
mlxsw: thermal: Fix out-of-bounds memory accesses
ethernet: s2io: fix setting mac address during resume
NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
nfc: fix error handling of nfc_proto_register()
Revert "net: procfs: add seq_puts() statement for dev_mcast"
net: encx24j600: check error in devm_regmap_init_encx24j600
net: korina: select CRC32
net: arc: select CRC32
net: dsa: felix: break at first CPU port during init and teardown
net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
net: mscc: ocelot: deny TX timestamping of non-PTP packets
net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
...
Xin Long [Thu, 14 Oct 2021 09:50:50 +0000 (05:50 -0400)]
icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe
In icmp_build_probe(), the icmp_ext_echo_iio parsing should be done
step by step and skb_header_pointer() return value should always be
checked, this patch fixes 3 places in there:
- On case ICMP_EXT_ECHO_CTYPE_NAME, it should only copy ident.name
from skb by skb_header_pointer(), its len is ident_len. Besides,
the return value of skb_header_pointer() should always be checked.
- On case ICMP_EXT_ECHO_CTYPE_INDEX, move ident_len check ahead of
skb_header_pointer(), and also do the return value check for
skb_header_pointer().
- On case ICMP_EXT_ECHO_CTYPE_ADDR, before accessing iio->ident.addr.
ctype3_hdr.addrlen, skb_header_pointer() should be called first,
then check its return value and ident_len.
On subcases ICMP_AFI_IP and ICMP_AFI_IP6, also do check for ident.
addr.ctype3_hdr.addrlen and skb_header_pointer()'s return value.
On subcase ICMP_AFI_IP, the len for skb_header_pointer() should be
"sizeof(iio->extobj_hdr) + sizeof(iio->ident.addr.ctype3_hdr) +
sizeof(struct in_addr)" or "ident_len".
v1->v2:
- To make it more clear, call skb_header_pointer() once only for
iio->indent's parsing as Jakub Suggested.
v2->v3:
- The extobj_hdr.length check against sizeof(_iio) should be done
before calling skb_header_pointer(), as Eric noticed.
Cai Huoqing [Thu, 14 Oct 2021 11:02:14 +0000 (19:02 +0800)]
MAINTAINERS: Update the devicetree documentation path of imx fec driver
Change the devicetree documentation path
to "Documentation/devicetree/bindings/net/fsl,fec.yaml"
since 'fsl-fec.txt' has been converted to 'fsl,fec.yaml' already.
Eiichi Tsukata [Wed, 13 Oct 2021 20:27:29 +0000 (17:27 -0300)]
sctp: account stream padding length for reconf chunk
sctp_make_strreset_req() makes repeated calls to sctp_addto_chunk()
which will automatically account for padding on each call. inreq and
outreq are already 4 bytes aligned, but the payload is not and doing
SCTP_PAD4(a + b) (which _sctp_make_chunk() did implicitly here) is
different from SCTP_PAD4(a) + SCTP_PAD4(b) and not enough. It led to
possible attempt to use more buffer than it was allocated and triggered
a BUG_ON.
This results in out-of-bounds memory accesses when thermal state
transition statistics are enabled (CONFIG_THERMAL_STATISTICS=y), as the
transition table is accessed with a too large index (state) [1].
According to the thermal maintainer, it is the responsibility of the
driver to reject such operations [2].
Therefore, return an error when the state to be set exceeds the maximum
cooling state supported by the driver.
To avoid dead code, as suggested by the thermal maintainer [3],
partially revert commit a421ce088ac8 ("mlxsw: core: Extend cooling
device with cooling levels") that tried to interpret these invalid
cooling states (above the maximum) in a special way. The cooling levels
array is not removed in order to prevent the fans going below 20% PWM,
which would cause them to get stuck at 0% PWM.
[1]
BUG: KASAN: slab-out-of-bounds in thermal_cooling_device_stats_update+0x271/0x290
Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5
Memory state around the buggy address: ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Arnd Bergmann [Wed, 13 Oct 2021 14:35:49 +0000 (16:35 +0200)]
ethernet: s2io: fix setting mac address during resume
After recent cleanups, gcc started warning about a suspicious
memcpy() call during the s2io_io_resume() function:
In function '__dev_addr_set',
inlined from 'eth_hw_addr_set' at include/linux/etherdevice.h:318:2,
inlined from 's2io_set_mac_addr' at drivers/net/ethernet/neterion/s2io.c:5205:2,
inlined from 's2io_io_resume' at drivers/net/ethernet/neterion/s2io.c:8569:7:
arch/x86/include/asm/string_32.h:182:25: error: '__builtin_memcpy' accessing 6 bytes at offsets 0 and 2 overlaps 4 bytes at offset 2 [-Werror=restrict]
182 | #define memcpy(t, f, n) __builtin_memcpy(t, f, n)
| ^~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/netdevice.h:4648:9: note: in expansion of macro 'memcpy'
4648 | memcpy(dev->dev_addr, addr, len);
| ^~~~~~
What apparently happened is that an old cleanup changed the calling
conventions for s2io_set_mac_addr() from taking an ethernet address
as a character array to taking a struct sockaddr, but one of the
callers was not changed at the same time.
Change it to instead call the low-level do_s2io_prog_unicast() function
that still takes the old argument type.
Linus Torvalds [Thu, 14 Oct 2021 13:53:36 +0000 (09:53 -0400)]
Merge tag 'sound-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains quite a few device-specific fixes for usual HD- and
USB-audio in addition to a couple of ALSA core fixes (a UAF fix in
sequencer and a fix for a misplaced PCM 32bit compat ioctl).
Nothing really stands out"
* tag 'sound-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Add quirk for VF0770
ALSA: hda: avoid write to STATESTS if controller is in reset
ALSA: hda/realtek: Fix the mic type detection issue for ASUS G551JW
ALSA: pcm: Workaround for a wrong offset in SYNC_PTR compat ioctl
ALSA: hda/realtek: Fix for quirk to enable speaker output on the Lenovo 13s Gen2
ALSA: hda: intel: Allow repeatedly probing on codec configuration errors
ALSA: hda/realtek: Add quirk for TongFang PHxTxX1
ALSA: hda/realtek - ALC236 headset MIC recording issue
ALSA: usb-audio: Enable rate validation for Scarlett devices
ALSA: hda/realtek: Add quirk for Clevo X170KM-G
ALSA: hda/realtek: Complete partial device name to avoid ambiguity
ALSA: hda - Enable headphone mic on Dell Latitude laptops with ALC3254
ALSA: seq: Fix a potential UAF by wrong private_free call order
ALSA: hda/realtek: Enable 4-speaker output for Dell Precision 5560 laptop
ALSA: usb-audio: Fix a missing error check in scarlett gen2 mixer
Ziyang Xuan [Wed, 13 Oct 2021 07:50:32 +0000 (15:50 +0800)]
NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
'skb' is allocated in digital_in_send_sdd_req(), but not free when
digital_in_send_cmd() failed, which will cause memory leak. Fix it
by freeing 'skb' if digital_in_send_cmd() return failed.
Fixes: 2c66daecc409 ("NFC Digital: Add NFC-A technology support") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziyang Xuan [Wed, 13 Oct 2021 07:50:12 +0000 (15:50 +0800)]
NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
'params' is allocated in digital_tg_listen_mdaa(), but not free when
digital_send_cmd() failed, which will cause memory leak. Fix it by
freeing 'params' if digital_send_cmd() return failed.
Fixes: 1c7a4c24fbfd ("NFC Digital: Add target NFC-DEP support") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ziyang Xuan [Wed, 13 Oct 2021 03:49:32 +0000 (11:49 +0800)]
nfc: fix error handling of nfc_proto_register()
When nfc proto id is using, nfc_proto_register() return -EBUSY error
code, but forgot to unregister proto. Fix it by adding proto_unregister()
in the error handling case.
It turns out that there are user space programs which got broken by that
change. One example is the "ifstat" program shipped by Debian:
https://packages.debian.org/source/bullseye/ifstat
which, confusingly enough, seems to not have anything in common with the
much more familiar (at least to me) ifstat program from iproute2:
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/misc/ifstat.c
The reason why the ifstat shipped by Debian (v1.1, with a Debian patch
upgrading it to 1.1-8.1 at the time of writing) is broken is because its
"proc" driver/backend parses the header very literally:
main/drivers.c#L825
if (!data->checked && strncmp(buf, "Inter-|", 7))
goto badproc;
and there's no way in which the header can be changed such that programs
parsing like that would not get broken.
Even if we fix this ancient and very "lightly" maintained program to
parse the text output of /proc/net/dev in a more sensible way, this
story seems bound to repeat again with other programs, and modifying
them all could cause more trouble than it's worth. On the other hand,
the reverted patch had no other reason than an aesthetic one, so
reverting it is the simplest way out.
I don't know what other distributions would be affected; the fact that
Debian doesn't ship the iproute2 version of the program (a different
code base altogether, which uses netlink and not /proc/net/dev) is
surprising in itself.
Nanyong Sun [Tue, 12 Oct 2021 12:59:01 +0000 (20:59 +0800)]
net: encx24j600: check error in devm_regmap_init_encx24j600
devm_regmap_init may return error which caused by like out of memory,
this will results in null pointer dereference later when reading
or writing register:
Jakub Kicinski [Wed, 13 Oct 2021 20:39:54 +0000 (13:39 -0700)]
Merge tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-10-12
* tag 'mlx5-fixes-2021-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
net/mlx5e: Switchdev representors are not vlan challenged
net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
net/mlx5e: Allow only complete TXQs partition in MQPRIO channel mode
net/mlx5: Fix cleanup of bridge delayed work
====================
Vegard Nossum [Tue, 12 Oct 2021 09:34:46 +0000 (11:34 +0200)]
net: arc: select CRC32
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/ethernet/arc/emac_main.o: in function `arc_emac_set_rx_mode':
emac_main.c:(.text+0xb11): undefined reference to `crc32_le'
The crc32_le() call comes through the ether_crc_le() call in
arc_emac_set_rx_mode().
[v2: moved the select to ARC_EMAC_CORE; the Makefile is a bit confusing,
but the error comes from emac_main.o, which is part of the arc_emac module,
which in turn is enabled by CONFIG_ARC_EMAC_CORE. Note that arc_emac is
different from emac_arc...]
Jakub Kicinski [Wed, 13 Oct 2021 00:35:21 +0000 (17:35 -0700)]
Merge branch 'felix-dsa-driver-fixes'
Vladimir Oltean says:
====================
Felix DSA driver fixes
This is an assorted collection of fixes for issues seen on the NXP
LS1028A switch.
- PTP packet drops due to switch congestion result in catastrophic
damage to the driver's state
- loops are not blocked by STP if using the ocelot-8021q tagger
- driver uses the wrong CPU port when two of them are defined in DT
- module autoloading is broken* with both tagging protocol drivers
(ocelot and ocelot-8021q)
Changes in v2:
- Stop printing that we aren't going to take TX timestamps if we don't
have TX timestamping anyway, and we are just carrying PTP frames for a
cascaded DSA switch.
- Shorten the deferred xmit kthread name so that it fits the 16
character limit (TASK_COMM_LEN)
====================
Vladimir Oltean [Tue, 12 Oct 2021 11:40:44 +0000 (14:40 +0300)]
net: dsa: felix: break at first CPU port during init and teardown
The NXP LS1028A switch has two Ethernet ports towards the CPU, but only
one of them is capable of acting as an NPI port at a time (inject and
extract packets using DSA tags).
However, using the alternative ocelot-8021q tagging protocol, it should
be possible to use both CPU ports symmetrically, but for that we need to
mark both ports in the device tree as DSA masters.
In the process of doing that, it can be seen that traffic to/from the
network stack gets broken, and this is because the Felix driver iterates
through all DSA CPU ports and configures them as NPI ports. But since
there can only be a single NPI port, we effectively end up in a
situation where DSA thinks the default CPU port is the first one, but
the hardware port configured to be an NPI is the last one.
I would like to treat this as a bug, because if the updated device trees
are going to start circulating, it would be really good for existing
kernels to support them, too.
Fixes: adb3dccf090b ("net: dsa: felix: convert to the new .change_tag_protocol DSA API") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:43 +0000 (14:40 +0300)]
net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
When setting up a bridge with stp_state 1, topology changes are not
detected and loops are not blocked. This is because the standard way of
transmitting a packet, based on VLAN IDs redirected by VCAP IS2 to the
right egress port, does not override the port STP state (in the case of
Ocelot switches, that's really the PGID_SRC masks).
To force a packet to be injected into a port that's BLOCKING, we must
send it as a control packet, which means in the case of this tagger to
send it using the manual register injection method. We already do this
for PTP frames, extend the logic to apply to any link-local MAC DA.
Fixes: 7c83a7c539ab ("net: dsa: add a second tagger for Ocelot switches based on tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:42 +0000 (14:40 +0300)]
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
At present, when a PTP packet which requires TX timestamping gets
dropped under congestion by the switch, things go downhill very fast.
The driver keeps a clone of that skb in a queue of packets awaiting TX
timestamp interrupts, but interrupts will never be raised for the
dropped packets.
Moreover, matching timestamped packets to timestamps is done by a 2-bit
timestamp ID, and this can wrap around and we can match on the wrong skb.
Since with the default NPI-based tagging protocol, we get no notification
about packet drops, the best we can do is eventually recover from the
drop of a PTP frame: its skb will be dead memory until another skb which
was assigned the same timestamp ID happens to find it.
However, with the ocelot-8021q tagger which injects packets using the
manual register interface, it appears that we can check for more
information, such as:
- whether the input queue has reached the high watermark or not
- whether the injection group's FIFO can accept additional data or not
so we know that a PTP frame is likely to get dropped before actually
sending it, and drop it ourselves (because DSA uses NETIF_F_LLTX, so it
can't return NETDEV_TX_BUSY to ask the qdisc to requeue the packet).
But when we do that, we can also remove the skb from the timestamping
queue, because there surely won't be any timestamp that matches it.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:41 +0000 (14:40 +0300)]
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
Michael reported that when using the "ocelot-8021q" tagging protocol,
the switch driver module must be manually loaded before the tagging
protocol can be loaded/is available.
This appears to be the same problem described here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
where due to the fact that DSA tagging protocols make use of symbols
exported by the switch drivers, circular dependencies appear and this
breaks module autoloading.
The ocelot_8021q driver needs the ocelot_can_inject() and
ocelot_port_inject_frame() functions from the switch library. Previously
the wrong approach was taken to solve that dependency: shims were
provided for the case where the ocelot switch library was compiled out,
but that turns out to be insufficient, because the dependency when the
switch lib _is_ compiled is problematic too.
We cannot declare ocelot_can_inject() and ocelot_port_inject_frame() as
static inline functions, because these access I/O functions like
__ocelot_write_ix() which is called by ocelot_write_rix(). Making those
static inline basically means exposing the whole guts of the ocelot
switch library, not ideal...
We already have one tagging protocol driver which calls into the switch
driver during xmit but not using any exported symbol: sja1105_defer_xmit.
We can do the same thing here: create a kthread worker and one work item
per skb, and let the switch driver itself do the register accesses to
send the skb, and then consume it.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Reported-by: Michael Walle <michael@walle.cc> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
As explained here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocol drivers cannot depend on symbols exported by switch
drivers, because this creates a circular dependency that breaks module
autoloading.
The tag_ocelot.c file depends on the ocelot_ptp_rew_op() function
exported by the common ocelot switch lib. This function looks at
OCELOT_SKB_CB(skb) and computes how to populate the REW_OP field of the
DSA tag, for PTP timestamping (the command: one-step/two-step, and the
TX timestamp identifier).
None of that requires deep insight into the driver, it is quite
stateless, as it only depends upon the skb->cb. So let's make it a
static inline function and put it in include/linux/dsa/ocelot.h, a
file that despite its name is used by the ocelot switch driver for
populating the injection header too - since commit 40d3f295b5fe ("net:
mscc: ocelot: use common tag parsing code with DSA").
With that function declared as static inline, its body is expanded
inside each call site, so the dependency is broken and the DSA tagger
can be built without the switch library, upon which the felix driver
depends.
Fixes: 39e5308b3250 ("net: mscc: ocelot: support PTP Sync one-step timestamping") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:39 +0000 (14:40 +0300)]
net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
The sad reality is that when a PTP frame with a TX timestamping request
is transmitted, it isn't guaranteed that it will make it all the way to
the wire (due to congestion inside the switch), and that a timestamp
will be taken by the hardware and placed in the timestamp FIFO where an
IRQ will be raised for it.
The implication is that if enough PTP frames are silently dropped by the
hardware such that the timestamp ID has rolled over, it is possible to
match a timestamp to an old skb.
Furthermore, nobody will match on the real skb corresponding to this
timestamp, since we stupidly matched on a previous one that was stale in
the queue, and stopped there.
So PTP timestamping will be broken and there will be no way to recover.
It looks like the hardware parses the sequenceID from the PTP header,
and also provides that metadata for each timestamp. The driver currently
ignores this, but it shouldn't.
As an extra resiliency measure, do the following:
- check whether the PTP sequenceID also matches between the skb and the
timestamp, treat the skb as stale otherwise and free it
- if we see a stale skb, don't stop there and try to match an skb one
more time, chances are there's one more skb in the queue with the same
timestamp ID, otherwise we wouldn't have ever found the stale one (it
is by timestamp ID that we matched it).
While this does not prevent PTP packet drops, it at least prevents
the catastrophic consequences of incorrect timestamp matching.
Since we already call ptp_classify_raw in the TX path, save the result
in the skb->cb of the clone, and just use that result in the interrupt
code path.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:38 +0000 (14:40 +0300)]
net: mscc: ocelot: deny TX timestamping of non-PTP packets
It appears that Ocelot switches cannot timestamp non-PTP frames,
I tested this using the isochron program at:
https://github.com/vladimiroltean/tsn-scripts
with the result that the driver increments the ocelot_port->ts_id
counter as expected, puts it in the REW_OP, but the hardware seems to
not timestamp these packets at all, since no IRQ is emitted.
Therefore check whether we are sending PTP frames, and refuse to
populate REW_OP otherwise.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:37 +0000 (14:40 +0300)]
net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
When skb_match is NULL, it means we received a PTP IRQ for a timestamp
ID that the kernel has no idea about, since there is no skb in the
timestamping queue with that timestamp ID.
This is a grave error and not something to just "continue" over.
So print a big warning in case this happens.
Also, move the check above ocelot_get_hwtimestamp(), there is no point
in reading the full 64-bit current PTP time if we're not going to do
anything with it anyway for this skb.
Vladimir Oltean [Tue, 12 Oct 2021 11:40:36 +0000 (14:40 +0300)]
net: mscc: ocelot: avoid overflowing the PTP timestamp FIFO
PTP packets with 2-step TX timestamp requests are matched to packets
based on the egress port number and a 6-bit timestamp identifier.
All PTP timestamps are held in a common FIFO that is 128 entry deep.
This patch ensures that back-to-back timestamping requests cannot exceed
the hardware FIFO capacity. If that happens, simply send the packets
without requesting a TX timestamp to be taken (in the case of felix,
since the DSA API has a void return code in ds->ops->port_txtstamp) or
drop them (in the case of ocelot).
I've moved the ts_id_lock from a per-port basis to a per-switch basis,
because we need separate accounting for both numbers of PTP frames in
flight. And since we need locking to inc/dec the per-switch counter,
that also offers protection for the per-port counter and hence there is
no reason to have a per-port counter anymore.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Tue, 12 Oct 2021 11:40:35 +0000 (14:40 +0300)]
net: mscc: ocelot: make use of all 63 PTP timestamp identifiers
At present, there is a problem when user space bombards a port with PTP
event frames which have TX timestamping requests (or when a tc-taprio
offload is installed on a port, which delays the TX timestamps by a
significant amount of time). The driver will happily roll over the 2-bit
timestamp ID and this will cause incorrect matches between an skb and
the TX timestamp collected from the FIFO.
The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
63 is reserved, so that leaves identifiers 0-62 to be used.
The timestamp identifiers are selected by the REW_OP packet field, and
are actually shared between CPU-injected frames and frames which match a
VCAP IS2 rule that modifies the REW_OP. The hardware supports
partitioning between the two uses of the REW_OP field through the
PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.
The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
ocelot_init_timestamp() which makes all timestamp identifiers available
to CPU injection.
Therefore, we can make use of all 63 timestamp identifiers, which should
allow more timestampable packets to be in flight on each port. This is
only part of the solution, more issues will be addressed in future changes.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Fix circular dependency between sja1105 and tag_sja1105
As discussed here:
https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
DSA tagging protocols cannot use symbols exported by switch drivers.
Eliminate the two instances of that from tag_sja1105, and that allows us
to have a working setup with modules again.
====================
Re-applying to net, this was mistakenly applied to net-next,
see first Link.
Vladimir Oltean [Wed, 22 Sep 2021 14:37:26 +0000 (17:37 +0300)]
net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.
The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.
Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.
Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.
Vladimir Oltean [Wed, 22 Sep 2021 14:37:25 +0000 (17:37 +0300)]
net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.
The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.
So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).
On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.
DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.
When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.
The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.
To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).
With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.
Baowen Zheng [Tue, 12 Oct 2021 12:48:50 +0000 (14:48 +0200)]
nfp: flow_offload: move flow_indr_dev_register from app init to app start
In commit 74fc4f828769 ("net: Fix offloading indirect devices dependency
on qdisc order creation"), it adds a process to trigger the callback to
setup the bo callback when the driver regists a callback.
In our current implement, we are not ready to run the callback when nfp
call the function flow_indr_dev_register, then there will be error
message as:
net/mlx5e: Fix division by 0 in mlx5e_select_queue for representors
Commit 846d6da1fcdb ("net/mlx5e: Fix division by 0 in
mlx5e_select_queue") makes mlx5e_build_nic_params assign a non-zero
initial value to priv->num_tc_x_num_ch, so that mlx5e_select_queue
doesn't fail with division by 0 if called before the first activation of
channels. However, the initialization flow of representors doesn't call
mlx5e_build_nic_params, so this bug can still happen with representors.
This commit fixes the bug by adding the missing assignment to
mlx5e_build_rep_params.
Aya Levin [Sun, 26 Sep 2021 14:55:41 +0000 (17:55 +0300)]
net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.
Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Saeed Mahameed [Tue, 5 Oct 2021 04:20:25 +0000 (21:20 -0700)]
net/mlx5e: Switchdev representors are not vlan challenged
Before this patch, mlx5 representors advertised the
NETIF_F_VLAN_CHALLENGED bit, this could lead to missing features when
using reps with vxlan/bridge and maybe other virtual interfaces,
when such interfaces inherit this bit and block vlan usage in their
topology.
Example:
$ip link add dev bridge type bridge
# add representor interface to the bridge
$ip link set dev pf0hpf master
$ip link add link bridge name vlan10 type vlan id 10 protocol 802.1q
Error: 8021q: VLANs not supported on device.
Reps are perfectly capable of handling vlan traffic, although they don't
implement vlan_{add,kill}_vid ndos, hence, remove
NETIF_F_VLAN_CHALLENGED advertisement.
Valentine Fatiev [Sun, 15 Aug 2021 14:43:19 +0000 (17:43 +0300)]
net/mlx5e: Fix memory leak in mlx5_core_destroy_cq() error path
Prior to this patch in case mlx5_core_destroy_cq() failed it returns
without completing all destroy operations and that leads to memory leak.
Instead, complete the destroy flow before return error.
Also move mlx5_debug_cq_remove() to the beginning of mlx5_core_destroy_cq()
to be symmetrical with mlx5_core_create_cq().
Currently, bridge cleanup is calling to cancel_delayed_work(). When this
function is finished, there is a chance that the delayed work is still
running. Also, the delayed work is queueing itself.
As a result, we might execute the delayed work after the bridge cleanup
have finished and hit a null-ptr oops[1].
Fix it by using cancel_delayed_work_sync(), which is waiting until the
work is done and will cancel the queue work.
Jonas Hahnfeld [Tue, 12 Oct 2021 20:09:07 +0000 (22:09 +0200)]
ALSA: usb-audio: Add quirk for VF0770
The device advertises 8 formats, but only a rate of 48kHz is honored
by the hardware and 24 bits give chopped audio, so only report the
one working combination. This fixes out-of-the-box audio experience
with PipeWire which otherwise attempts to choose S24_3LE (while
PulseAudio defaulted to S16_LE).
Linus Torvalds [Tue, 12 Oct 2021 18:16:38 +0000 (11:16 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix CMA gigantic page order for 16K/64K page sizes
- Fix section mismatch error in drivers/acpi/arm64/gtdt.c
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
acpi/arm64: fix next_platform_timer() section mismatch error
arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE
Linus Torvalds [Tue, 12 Oct 2021 18:11:39 +0000 (11:11 -0700)]
Merge tag 'platform-drivers-x86-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"A second (small) set of pdx86 bug-fixes and new hardware ids for 5.15"
* tag 'platform-drivers-x86-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: int1092: Fix non sequential device mode handling
platform/x86: intel_skl_int3472: Correct null check
platform/x86: gigabyte-wmi: add support for B550 AORUS ELITE AX V2
platform/x86: amd-pmc: Add alternative acpi id for PMC controller
platform/x86: intel_scu_ipc: Update timeout value in comment
platform/x86: intel_scu_ipc: Increase virtual timeout to 10s
platform/x86: intel_scu_ipc: Fix busy loop expiry time
platform/x86: dell: Make DELL_WMI_PRIVACY depend on DELL_WMI
platform/mellanox: mlxreg-io: Fix read access of n-bytes size attributes
platform/mellanox: mlxreg-io: Fix argument base in kstrtou32() call
Fix modpost Section mismatch error in next_platform_timer().
[...]
WARNING: modpost: vmlinux.o(.text.unlikely+0x26e60): Section mismatch in reference from the function next_platform_timer() to the variable .init.data:acpi_gtdt_desc
The function next_platform_timer() references
the variable __initdata acpi_gtdt_desc.
This is often because next_platform_timer lacks a __initdata
annotation or the annotation of acpi_gtdt_desc is wrong.
WARNING: modpost: vmlinux.o(.text.unlikely+0x26e64): Section mismatch in reference from the function next_platform_timer() to the variable .init.data:acpi_gtdt_desc
The function next_platform_timer() references
the variable __initdata acpi_gtdt_desc.
This is often because next_platform_timer lacks a __initdata
annotation or the annotation of acpi_gtdt_desc is wrong.
Kai Vehmanen [Tue, 12 Oct 2021 14:29:35 +0000 (17:29 +0300)]
ALSA: hda: avoid write to STATESTS if controller is in reset
The snd_hdac_bus_reset_link() contains logic to clear STATESTS register
before performing controller reset. This code dates back to an old
bugfix in commit e8a7f136f5ed ("[ALSA] hda-intel - Improve HD-audio
codec probing robustness"). Originally the code was added to
azx_reset().
The code was moved around in commit a41d122449be ("ALSA: hda - Embed bus
into controller object") and ended up to snd_hdac_bus_reset_link() and
called primarily via snd_hdac_bus_init_chip().
The logic to clear STATESTS is correct when snd_hdac_bus_init_chip() is
called when controller is not in reset. In this case, STATESTS can be
cleared. This can be useful e.g. when forcing a controller reset to retry
codec probe. A normal non-power-on reset will not clear the bits.
However, this old logic is problematic when controller is already in
reset. The HDA specification states that controller must be taken out of
reset before writing to registers other than GCTL.CRST (1.0a spec,
3.3.7). The write to STATESTS in snd_hdac_bus_reset_link() will be lost
if the controller is already in reset per the HDA specification mentioned.
This has been harmless on older hardware. On newer generation of Intel
PCIe based HDA controllers, if configured to report issues, this write
will emit an unsupported request error. If ACPI Platform Error Interface
(APEI) is enabled in kernel, this will end up to kernel log.
Fix the code in snd_hdac_bus_reset_link() to only clear the STATESTS if
the function is called when controller is not in reset. Otherwise
clearing the bits is not possible and should be skipped.
Hui Wang [Tue, 12 Oct 2021 11:47:48 +0000 (19:47 +0800)]
ALSA: hda/realtek: Fix the mic type detection issue for ASUS G551JW
We need to define the codec pin 0x1b to be the mic, but somehow
the mic doesn't support hot plugging detection, and Windows also has
this issue, so we set it to phantom headset-mic.
Also the determine_headset_type() often returns the omtp type by a
mistake when we plug a ctia headset, this makes the mic can't record
sound at all. Because most of the headset are ctia type nowadays and
some machines have the fixed ctia type audio jack, it is possible this
machine has the fixed ctia jack too. Here we set this mic jack to
fixed ctia type, this could avoid the mic type detection mistake and
make the ctia headset work stable.
Jacob Keller [Mon, 11 Oct 2021 20:48:06 +0000 (13:48 -0700)]
ice: fix locking for Tx timestamp tracking flush
Commit 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush")
added a lock around the Tx timestamp tracker flow which is used to
cleanup any left over SKBs and prepare for device removal.
This lock is problematic because it is being held around a call to
ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
write command to firmware. This could lead to a deadlock if the mutex
actually sleeps, and causes the following warning on a kernel with
preemption debugging enabled:
Notice that this is the only case where we use the lock in this way. In
the cleanup kthread and work kthread the lock is only taken around the
bit accesses. This was done intentionally to avoid this kind of issue.
The way the lock is used, we only protect ordering of bit sets vs bit
clears. The Tx writers in the hot path don't need to be protected
against the entire kthread loop. The Tx queues threads only need to
ensure that they do not re-use an index that is currently in use. The
cleanup loop does not need to block all new set bits, since it will
re-queue itself if new timestamps are present.
Fix the tracker flow so that it uses the same flow as the standard
cleanup thread. In addition, ensure the in_use bitmap actually gets
cleared properly.
This fixes the warning and also avoids the potential deadlock that might
have occurred otherwise.
Fixes: 4dd0d5c33c3e ("ice: add lock around Tx timestamp tracker flush") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Oct 2021 10:49:50 +0000 (11:49 +0100)]
Merge branch 'ioam-fixes'
Justin Iurman says:
====================
Correct the IOAM behavior for undefined trace type bits
(@Jakub @David: there will be a conflict for #2 when merging net->net-next, due
to commit [1]. The conflict is only 5-10 lines for #2 (#1 should be fine) inside
the file tools/testing/selftests/net/ioam6.sh, so quite short though possibly
ugly. Sorry for that, I didn't expect to post this one... Had I known, I'd have
made the opposite.)
Modify both the input and output behaviors regarding the trace type when one of
the undefined bits is set. The goal is to keep the interoperability when new
fields (aka new bits inside the range 12-21) will be defined.
The draft [2] says the following:
---------------------------------------------------------------
"Bit 12-21 Undefined. These values are available for future
assignment in the IOAM Trace-Type Registry (Section 8.2).
Every future node data field corresponding to one of
these bits MUST be 4-octets long. An IOAM encapsulating
node MUST set the value of each undefined bit to 0. If
an IOAM transit node receives a packet with one or more
of these bits set to 1, it MUST either:
1. Add corresponding node data filled with the reserved
value 0xFFFFFFFF, after the node data fields for the
IOAM-Trace-Type bits defined above, such that the
total node data added by this node in units of
4-octets is equal to NodeLen, or
2. Not add any node data fields to the packet, even for
the IOAM-Trace-Type bits defined above."
---------------------------------------------------------------
The output behavior has been modified to respect the fact that "an IOAM encap
node MUST set the value of each undefined bit to 0" (i.e., undefined bits can't
be set anymore).
As for the input behavior, current implementation is based on the second choice
(i.e., "not add any data fields to the packet [...]"). With this solution, any
interoperability is lost (i.e., if a new bit is defined, then an "old" kernel
implementation wouldn't fill IOAM data when such new bit is set inside the trace
type).
The input behavior is therefore relaxed and these undefined bits are now allowed
to be set. It is only possible thanks to the sentence "every future node data
field corresponding to one of these bits MUST be 4-octets long". Indeed, the
default empty value (the one for 4-octet fields) is inserted whenever an
undefined bit is set.
Justin Iurman [Mon, 11 Oct 2021 18:04:11 +0000 (20:04 +0200)]
ipv6: ioam: move the check for undefined bits
The check for undefined bits in the trace type is moved from the input side to
the output side, while the input side is relaxed and now inserts default empty
values when an undefined bit is set.
Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Arun Ramadoss [Mon, 11 Oct 2021 15:48:08 +0000 (21:18 +0530)]
net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
When the ksz module is installed and removed using rmmod, kernel crashes
with null pointer dereferrence error. During rmmod, ksz_switch_remove
function tries to cancel the mib_read_workqueue using
cancel_delayed_work_sync routine and unregister switch from dsa.
During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
reschedules the workqueue since mib_interval is non-zero.
Due to which queue executed after mib_interval and it tries to access
dp->slave. But the slave is unregistered in the ksz_switch_remove
function. Hence kernel crashes.
To avoid this crash, before canceling the workqueue, resetted the
mib_interval to 0.
v1 -> v2:
-Removed the if condition in ksz_mib_read_work
Fixes: 469b390e1ba3 ("net: dsa: microchip: use delayed_work instead of timer + work") Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vegard Nossum [Mon, 11 Oct 2021 15:22:49 +0000 (17:22 +0200)]
r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256
Fix the following build/link errors by adding a dependency on
CRYPTO, CRYPTO_HASH, CRYPTO_SHA256 and CRC32:
ld: drivers/net/usb/r8152.o: in function `rtl8152_fw_verify_checksum':
r8152.c:(.text+0x2b2a): undefined reference to `crypto_alloc_shash'
ld: r8152.c:(.text+0x2bed): undefined reference to `crypto_shash_digest'
ld: r8152.c:(.text+0x2c50): undefined reference to `crypto_destroy_tfm'
ld: drivers/net/usb/r8152.o: in function `_rtl8152_set_rx_mode':
r8152.c:(.text+0xdcb0): undefined reference to `crc32_le'
Fixes: 9370f2d05a2a1 ("r8152: support request_firmware for RTL8153") Fixes: ac718b69301c7 ("net/usb: new driver for RTL8152") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maarten Zanders [Mon, 11 Oct 2021 14:27:20 +0000 (16:27 +0200)]
net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's
mv88e6xxx_port_ppu_updates() interpretes data in the PORT_STS
register incorrectly for internal ports (ie no PPU). In these
cases, the PHY_DETECT bit indicates link status. This results
in forcing the MAC state whenever the PHY link goes down which
is not intended. As a side effect, LED's configured to show
link status stay lit even though the physical link is down.
Add a check in mac_link_down and mac_link_up to see if it
concerns an external port and only then, look at PPU status.
Fixes: 5d5b231da7ac (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down) Reported-by: Maarten Zanders <m.zanders@televic.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Maarten Zanders <maarten.zanders@mind.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Wan Jiabing [Mon, 11 Oct 2021 02:27:41 +0000 (10:27 +0800)]
net: mscc: ocelot: Fix dumplicated argument in ocelot
Fix the following coccicheck warning:
drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument
to & or |
These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here.
Here should be DEV_CLOCK_CFG_MAC_RX_RST.
Fixes: e6e12df625f2 ("net: mscc: ocelot: convert to phylink") Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Boyd [Fri, 8 Oct 2021 21:59:45 +0000 (14:59 -0700)]
af_unix: Rename UNIX-DGRAM to UNIX to maintain backwards compatability
Then name of this protocol changed in commit 94531cfcbe79 ("af_unix: Add
unix_stream_proto for sockmap") because that commit added stream support
to the af_unix protocol. Renaming the existing protocol makes a ChromeOS
protocol test[1] fail now that the name has changed in
/proc/net/protocols from "UNIX" to "UNIX-DGRAM".
Let's put the name back to how it was while keeping the stream protocol
as "UNIX-STREAM" so that the procfs interface doesn't change. This fixes
the test and maintains backwards compatibility in proc.
Linus Torvalds [Tue, 12 Oct 2021 00:25:08 +0000 (17:25 -0700)]
Merge tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kunit fixes from Shuah Khan:
- Fixes to address the structleak plugin causing the stack frame size
to grow immensely when used with KUnit. Fixes include adding a new
makefile to disable structleak and using it from KUnit iio, device
property, thunderbolt, and bitfield tests to disable it.
- KUnit framework reference count leak in kfree_at_end
- KUnit tool fix to resolve conflict between --json and --raw_output
and generate correct test output in either case.
- kernel-doc warnings due to mismatched arg names
* tag 'linux-kselftest-kunit-fixes-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: fix kernel-doc warnings due to mismatched arg names
bitfield: build kunit tests without structleak plugin
thunderbolt: build kunit tests without structleak plugin
device property: build kunit tests without structleak plugin
iio/test-format: build kunit tests without structleak plugin
gcc-plugins/structleak: add makefile var for disabling structleak
kunit: fix reference count leak in kfree_at_end
kunit: tool: better handling of quasi-bool args (--json, --raw_output)
Linus Torvalds [Tue, 12 Oct 2021 00:16:41 +0000 (17:16 -0700)]
Merge branch 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"All documentation / comment updates"
* 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroupv2, docs: fix misinformation in "device controller" section
cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
docs/cgroup: remove some duplicate words
Linus Torvalds [Mon, 11 Oct 2021 23:59:49 +0000 (16:59 -0700)]
Merge branch 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"One patch to add a missing __printf annotation and the other to enable
deferred printing for debug dumps to avoid deadlocks when triggered
from some contexts (e.g. console drivers)"
* 'for-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix state-dump console deadlock
workqueue: annotate alloc_workqueue() as printf
Linus Torvalds [Mon, 11 Oct 2021 23:48:19 +0000 (16:48 -0700)]
Merge tag 'for-5.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more error handling fixes, stemming from code inspection, error
injection or fuzzing"
* tag 'for-5.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix abort logic in btrfs_replace_file_extents
btrfs: check for error when looking up inode during dir entry replay
btrfs: unify lookup return value when dir entry is missing
btrfs: deal with errors when adding inode reference during log replay
btrfs: deal with errors when replaying dir entry during log replay
btrfs: deal with errors when checking if a dir entry exists during log replay
btrfs: update refs for any root except tree log roots
btrfs: unlock newly allocated extent buffer after error
Mike Kravetz [Tue, 5 Oct 2021 20:25:29 +0000 (13:25 -0700)]
arm64/hugetlb: fix CMA gigantic page order for non-4K PAGE_SIZE
For non-4K PAGE_SIZE configs, the largest gigantic huge page size is
CONT_PMD_SHIFT order. On arm64 with 64K PAGE_SIZE, the gigantic page is
16G. Therefore, one should be able to specify 'hugetlb_cma=16G' on the
kernel command line so that one gigantic page can be allocated from CMA.
However, when adding such an option the following message is produced:
hugetlb_cma: cma area should be at least 8796093022208 MiB
This is because the calculation for non-4K gigantic page order is
incorrect in the arm64 specific routine arm64_hugetlb_cma_reserve().
Fixes: abb7962adc80 ("arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs") Cc: <stable@vger.kernel.org> # 5.9.x Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20211005202529.213812-1-mike.kravetz@oracle.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Johan Hovold [Wed, 6 Oct 2021 11:58:52 +0000 (13:58 +0200)]
workqueue: fix state-dump console deadlock
Console drivers often queue work while holding locks also taken in their
console write paths, something which can lead to deadlocks on SMP when
dumping workqueue state (e.g. sysrq-t or on suspend failures).
where workqueues are, for example, used to push data to the line
discipline, process break signals and handle modem-status changes. Line
disciplines and serdev drivers can also queue work on write-wakeup
notifications, etc.
Reworking every console driver to avoid queuing work while holding locks
also taken in their write paths would complicate drivers and is neither
desirable or feasible.
Instead use the deferred-printk mechanism to avoid printing while
holding pool locks when dumping workqueue state.
Note that there are a few WARN_ON() assertions in the workqueue code
which could potentially also trigger a deadlock. Hopefully the ongoing
printk rework will provide a general solution for this eventually.
This was originally reported after a lockdep splat when executing
sysrq-t with the imx serial driver.
Takashi Iwai [Sun, 10 Oct 2021 07:55:46 +0000 (09:55 +0200)]
ALSA: pcm: Workaround for a wrong offset in SYNC_PTR compat ioctl
Michael Forney reported an incorrect padding type that was defined in
the commit 80fe7430c708 ("ALSA: add new 32-bit layout for
snd_pcm_mmap_status/control") for PCM control mmap data.
His analysis is correct, and this caused the misplacements of PCM
control data on 32bit arch and 32bit compat mode.
The bug is that the __pad2 definition in __snd_pcm_mmap_control64
struct was wrongly with __pad_before_uframe, which should have been
__pad_after_uframe instead. This struct is used in SYNC_PTR ioctl and
control mmap. Basically this bug leads to two problems:
- The offset of avail_min field becomes wrong, it's placed right after
appl_ptr without padding on little-endian
- When appl_ptr and avail_min are read as 64bit values in kernel side,
the values become either zero or corrupted (mixed up)
One good news is that, because both user-space and kernel
misunderstand the wrong offset, at least, 32bit application running on
32bit kernel works as is. Also, 64bit applications are unaffected
because the padding size is zero. The remaining problem is the 32bit
compat mode; as mentioned in the above, avail_min is placed right
after appl_ptr on little-endian archs, 64bit kernel reads bogus values
for appl_ptr updates, which may lead to streaming bugs like jumping,
XRUN or whatever unexpected.
(However, we haven't heard any serious bug reports due to this over
years, so practically seen, it's fairly safe to assume that the impact
by this bug is limited.)
Ideally speaking, we should correct the wrong mmap status control
definition. But this would cause again incompatibility with the
existing binaries, and fixing it (e.g. by renumbering ioctls) would be
really messy.
So, as of this patch, we only correct the behavior of 32bit compat
mode and keep the rest as is. Namely, the SYNC_PTR ioctl is now
handled differently in compat mode to read/write the 32bit values at
the right offsets. The control mmap of 32bit apps on 64bit kernels
has been already disabled (which is likely rather an overlook, but
this worked fine at this time :), so covering SYNC_PTR ioctl should
suffice as a fallback.
The int3472-discrete driver can enter an error path after initialising
int3472->clock.ena_gpio, but before it has registered the clock. This will
cause a NULL pointer dereference, because clkdev_drop() is not null aware.
Instead of guarding the call to skl_int3472_unregister_clock() by checking
for .ena_gpio, check specifically for the presence of the clk_lookup, which
will guarantee clkdev_create() has already been called.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214453 Fixes: 7540599a5ef1 ("platform/x86: intel_skl_int3472: Provide skl_int3472_unregister_clock()") Signed-off-by: Daniel Scally <djrscally@gmail.com> Link: https://lore.kernel.org/r/20211008224608.415949-1-djrscally@gmail.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Sachi King [Sat, 2 Oct 2021 04:18:39 +0000 (14:18 +1000)]
platform/x86: amd-pmc: Add alternative acpi id for PMC controller
The Surface Laptop 4 AMD has used the AMD0005 to identify this
controller instead of using the appropriate ACPI ID AMDI0005. Include
AMD0005 in the acpi id list.
platform/x86: intel_scu_ipc: Update timeout value in comment
The comment decribing the IPC timeout hadn't been updated when the
actual timeout was changed from 3 to 5 seconds in
commit a7d53dbbc70a ("platform/x86: intel_scu_ipc: Increase virtual
timeout from 3 to 5 seconds") .
Since the value is anyway updated to 10s now, take this opportunity to
update the value in the comment too.
Signed-off-by: Prashant Malani <pmalani@chromium.org> Cc: Benson Leung <bleung@chromium.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20210928101932.2543937-4-pmalani@chromium.org Signed-off-by: Hans de Goede <hdegoede@redhat.com>
platform/x86: intel_scu_ipc: Increase virtual timeout to 10s
Commit a7d53dbbc70a ("platform/x86: intel_scu_ipc: Increase virtual
timeout from 3 to 5 seconds") states that the recommended timeout range
is 5-10 seconds. Adjust the timeout value to the higher of those i.e 10
seconds, to account for situations where the 5 seconds is insufficient
for disconnect command success.
Signed-off-by: Prashant Malani <pmalani@chromium.org> Cc: Benson Leung <bleung@chromium.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20210928101932.2543937-3-pmalani@chromium.org Signed-off-by: Hans de Goede <hdegoede@redhat.com>
platform/x86: intel_scu_ipc: Fix busy loop expiry time
The macro IPC_TIMEOUT is already in jiffies (it is also used like that
elsewhere in the file when calling wait_for_completion_timeout()). Don’t
convert it using helper functions for the purposes of calculating the
busy loop expiry time.
Fixes: e7b7ab3847c9 (“platform/x86: intel_scu_ipc: Sleeping is fine when polling”) Signed-off-by: Prashant Malani <pmalani@chromium.org> Cc: Benson Leung <bleung@chromium.org> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://lore.kernel.org/r/20210928101932.2543937-2-pmalani@chromium.org Signed-off-by: Hans de Goede <hdegoede@redhat.com>
platform/mellanox: mlxreg-io: Fix argument base in kstrtou32() call
Change kstrtou32() argument 'base' to be zero instead of 'len'.
It works by chance for setting one bit value, but it is not supposed to
work in case value passed to mlxreg_io_attr_store() is greater than 1.
It works for example, for:
echo 1 > /sys/devices/platform/mlxplat/mlxreg-io/hwmon/.../jtag_enable
But it will fail for:
echo n > /sys/devices/platform/mlxplat/mlxreg-io/hwmon/.../jtag_enable,
where n > 1.
The flow for input buffer conversion is as below:
_kstrtoull(const char *s, unsigned int base, unsigned long long *res)
calls:
rv = _parse_integer(s, base, &_res);
For the second case, where n > 1:
- _parse_integer() converts 's' to 'val'.
For n=2, 'len' is set to 2 (string buffer is 0x32 0x0a), for n=3
'len' is set to 3 (string buffer 0x33 0x0a), etcetera.
- 'base' is equal or greater then '2' (length of input buffer).
As a result, _parse_integer() exits with result zero (rv):
rv = 0;
while (1) {
...
if (val >= base)-> (2 >= 2)
break;
...
rv++;
...
}
And _kstrtoull() in their turn will fail:
if (rv == 0)
return -EINVAL;
Fixes: 5ec4a8ace06c ("platform/mellanox: Introduce support for Mellanox register access driver") Signed-off-by: Vadim Pasternak <vadimp@nvidia.com> Link: https://lore.kernel.org/r/20210927142214.2613929-2-vadimp@nvidia.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
ALSA: hda/realtek: Fix for quirk to enable speaker output on the Lenovo 13s Gen2
The previous patch's HDA verb initialization for the Lenovo 13s
sequence was slightly off. This updated verb sequence has been tested
and confirmed working.
Linus Torvalds [Sun, 10 Oct 2021 17:12:42 +0000 (10:12 -0700)]
Merge tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A bit of a big batch, partly because I didn't send any last week, and
also just because the BPF fixes happened to land this week.
Summary:
- Fix a regression hit by the IPR SCSI driver, introduced by the
recent addition of MSI domains on pseries.
- A big series including 8 BPF fixes, some with potential security
impact and the rest various code generation issues.
- Fix our program check assembler entry path, which was accidentally
jumping into a gas macro and generating strange stack frames, which
could confuse find_bug().
- A couple of fixes, and related changes, to fix corner cases in our
machine check handling.
- Fix our DMA IOMMU ops, which were not always returning the optimal
DMA mask, leading to at least one device falling back to 32-bit DMA
when it shouldn't.
- A fix for KUAP handling on 32-bit Book3S.
- Fix crashes seen when kdumping on some pseries systems.
Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric
Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem,
Christoph Hellwig, Johan Almbladh, Stan Johnson"
* tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init
powerpc/32s: Fix kuap_kernel_restore()
powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler
powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
powerpc/64: warn if local irqs are enabled in NMI or hardirq context
powerpc/traps: do not enable irqs in _exception
powerpc/64s: fix program check interrupt emergency stack path
powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END
powerpc/bpf ppc32: Fix JMP32_JSET_K
powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/bpf: Fix BPF_MOD when imm == 1
powerpc/bpf: Validate branch ranges
powerpc/lib: Add helper to check if offset is within conditional branch range
powerpc/iommu: Report the correct most efficient DMA mask for PCI devices
Linus Torvalds [Sun, 10 Oct 2021 17:05:39 +0000 (10:05 -0700)]
Merge tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Borislav Petkov:
- Remove an extra section.len member in favour of section.sh_size
- Align .altinstructions section creation with the kernel's by creating
them with entry size of 0
- Fix objtool to convert a reloc symbol to a section offset and not to
not warn about not knowing how
* tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Remove redundant 'len' field from struct section
objtool: Make .altinstructions section entry size consistent
objtool: Remove reloc symbol type checks in get_alt_entry()
Linus Torvalds [Sun, 10 Oct 2021 17:00:51 +0000 (10:00 -0700)]
Merge tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A FPU fix to properly handle invalid MXCSR values: 32-bit masks them
out due to historical reasons and 64-bit kernels reject them
- A fix to clear X86_FEATURE_SMAP when support for is not
config-enabled
- Three fixes correcting misspelled Kconfig symbols used in code
- Two resctrl object cleanup fixes
- Yet another attempt at fixing the neverending saga of botched x86
timers, this time because some incredibly smart hardware decides to
turn off the HPET timer in a low power state - who cares if the OS is
relying on it...
- Check the full return value range of an SEV VMGEXIT call to determine
whether it returned an error
* tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Restore the masking out of reserved MXCSR bits
x86/Kconfig: Correct reference to MWINCHIP3D
x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI
x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n
x86/entry: Correct reference to intended CONFIG_64_BIT
x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
x86/hpet: Use another crystalball to evaluate HPET usability
x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]
Linus Torvalds [Sat, 9 Oct 2021 22:03:48 +0000 (15:03 -0700)]
Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Three driver bugfixes and one leak fix for the core"
* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mlxcpld: Modify register setting for 400KHz frequency
i2c: mlxcpld: Fix criteria for frequency setting
i2c: mediatek: Add OFFSET_EXT_CONF setting back
i2c: acpi: fix resource leak in reconfiguration device addition
Linus Torvalds [Sat, 9 Oct 2021 21:51:59 +0000 (14:51 -0700)]
Merge tag 'block-5.15-2021-10-09' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Two small fixes for this release:
- Add missing QUEUE_FLAG_HCTX_ACTIVE in the debugfs handling
(Johannes)
- Fix double free / UAF issue in __alloc_disk_node (Tetsuo)"
* tag 'block-5.15-2021-10-09' of git://git.kernel.dk/linux-block:
block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output
block: genhd: fix double kfree() in __alloc_disk_node()
Linus Torvalds [Sat, 9 Oct 2021 17:17:17 +0000 (10:17 -0700)]
Merge tag '5.15-rc4-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Six fixes for the ksmbd kernel server, including two additional
overflow checks, a fix for oops, and some cleanup (e.g. remove dead
code for less secure dialects that has been removed)"
* tag '5.15-rc4-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix oops from fuse driver
ksmbd: fix version mismatch with out of tree
ksmbd: use buf_data_size instead of recalculation in smb3_decrypt_req()
ksmbd: remove the leftover of smb2.0 dialect support
ksmbd: check strictly data area in ksmbd_smb2_check_message()
ksmbd: add the check to vaildate if stream protocol length exceeds maximum value
Linus Torvalds [Sat, 9 Oct 2021 16:07:58 +0000 (09:07 -0700)]
Merge tag 'riscv-for-linus-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A pair of fixes (along with the necessory cleanup) to our VDSO, to
avoid a locking during OOM and to prevent the text from overflowing
into the data page
- A fix to checksyscalls to teach it about our rv32 UABI
- A fix to add clone3() to the rv32 UABI, which was pointed out by
checksyscalls
- A fix to properly flush the icache on the local CPU in addition to
the remote CPUs
* tag 'riscv-for-linus-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
checksyscalls: Unconditionally ignore fstat{,at}64
riscv: Flush current cpu icache before other cpus
RISC-V: Include clone3() on rv32
riscv/vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
riscv/vdso: Move vdso data page up front
riscv/vdso: Refactor asm/vdso.h
Xuan Zhuo [Sat, 9 Oct 2021 09:17:53 +0000 (05:17 -0400)]
virtio-net: fix for skb_over_panic inside big mode
commit 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
accidentally reverted the effect of
commit 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode")
on drivers/net/virtio_net.c
As a result, users of crosvm (which is using large packet mode)
are experiencing crashes with 5.14-rc1 and above that do not
occur with 5.13.
Apply the original 1a8024239da ("virtio-net: fix for skb_over_panic inside big mode")
again, the original logic still holds:
In virtio-net's large packet mode, there is a hole in the space behind
buf.
hdr_padded_len - hdr_len
We must take this into account when calculating tailroom.
Cc: Greg KH <gregkh@linuxfoundation.org> Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Fixes: 126285651b7f ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Corentin Noël <corentin.noel@collabora.com> Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In case a PHY device was probed thus in the PHY_READY state, but not
configured and with no network device attached yet, we should not be
trying to shut it down because it has been brought back into reset by
phy_device_reset() towards the end of phy_probe() and anyway we have not
configured the PHY yet.
Fixes: e2f016cf7751 ("net: phy: add a shutdown procedure") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 9 Oct 2021 12:26:07 +0000 (15:26 +0300)]
net: dsa: hold rtnl_lock in dsa_switch_setup_tag_protocol
It was a documented fact that ds->ops->change_tag_protocol() offered
rtnetlink mutex protection to the switch driver, since there was an
ASSERT_RTNL right before the call in dsa_switch_change_tag_proto()
(initiated from sysfs).
The blamed commit introduced another call path for
ds->ops->change_tag_protocol() which does not hold the rtnl_mutex.
This is:
The reason why the rtnl_mutex is held in the sysfs call path is to
ensure that, once the master and all the DSA interfaces are down (which
is required so that no packets flow), they remain down during the
tagging protocol change.
The above calling order illustrates the fact that it should not be risky
to change the initial tagging protocol to the one specified in the
device tree at the given time:
- packets cannot enter the dsa_switch_rcv() packet type handler since
netdev_uses_dsa() for the master will not yet return true, since
dev->dsa_ptr has not yet been populated
- packets cannot enter the dsa_slave_xmit() function because no DSA
interface has yet been registered
So from the DSA core's perspective, holding the rtnl_mutex is indeed not
necessary.
Yet, drivers may need to do things which need rtnl_mutex protection. For
example:
These drivers do not really have a choice to take the rtnl_mutex
themselves, since in the sysfs case, the rtnl_mutex is already held.
Fixes: deff710703d8 ("net: dsa: Allow default tag protocol to be overridden from DT") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Fri, 8 Oct 2021 19:38:01 +0000 (12:38 -0700)]
ionic: don't remove netdev->dev_addr when syncing uc list
Bridging, and possibly other upper stack gizmos, adds the
lower device's netdev->dev_addr to its own uc list, and
then requests it be deleted when the upper bridge device is
removed. This delete request also happens with the bridging
vlan_filtering is enabled and then disabled.
Bonding has a similar behavior with the uc list, but since it
also uses set_mac to manage netdev->dev_addr, it doesn't have
the same the failure case.
Because we store our netdev->dev_addr in our uc list, we need
to ignore the delete request from dev_uc_sync so as to not
lose the address and all hope of communicating. Note that
ndo_set_mac_address is expressly changing netdev->dev_addr,
so no limitation is set there.
Fixes: 2a654540be10 ("ionic: Add Rx filter and rx_mode ndo support") Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
During this process, the kernel thread would call detach_capi_ctr()
to detach a register controller. if the controller
was not attached yet, detach_capi_ctr() would
trigger an array-index-out-bounds bug.
[ 46.866069][ T6479] UBSAN: array-index-out-of-bounds in
drivers/isdn/capi/kcapi.c:483:21
[ 46.867196][ T6479] index -1 is out of range for type 'capi_ctr *[32]'
[ 46.867982][ T6479] CPU: 1 PID: 6479 Comm: kcmtpd_ctr_0 Not tainted
5.15.0-rc2+ #8
[ 46.869002][ T6479] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.14.0-2 04/01/2014
[ 46.870107][ T6479] Call Trace:
[ 46.870473][ T6479] dump_stack_lvl+0x57/0x7d
[ 46.870974][ T6479] ubsan_epilogue+0x5/0x40
[ 46.871458][ T6479] __ubsan_handle_out_of_bounds.cold+0x43/0x48
[ 46.872135][ T6479] detach_capi_ctr+0x64/0xc0
[ 46.872639][ T6479] cmtp_session+0x5c8/0x5d0
[ 46.873131][ T6479] ? __init_waitqueue_head+0x60/0x60
[ 46.873712][ T6479] ? cmtp_add_msgpart+0x120/0x120
[ 46.874256][ T6479] kthread+0x147/0x170
[ 46.874709][ T6479] ? set_kthread_struct+0x40/0x40
[ 46.875248][ T6479] ret_from_fork+0x1f/0x30
[ 46.875773][ T6479]
Linus Torvalds [Fri, 8 Oct 2021 23:46:09 +0000 (16:46 -0700)]
Merge tag 's390-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix potential memory leak on a error path in eBPF
- Fix handling of zpci device on reserve
* tag 's390-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: fix zpci_zdev_put() on reserve
bpf, s390: Fix potential memory leak about jit_data
mqprio: Correct stats in mqprio_dump_class_stats().
Introduction of lockless subqueues broke the class statistics.
Before the change stats were accumulated in `bstats' and `qstats'
on the stack which was then copied to struct gnet_dump.
After the change the `bstats' and `qstats' are initialized to 0
and never updated, yet still fed to gnet_dump. The code updates
the global qdisc->cpu_bstats and qdisc->cpu_qstats instead,
clobbering them. Most likely a copy-paste error from the code in
mqprio_dump().
__gnet_stats_copy_basic() and __gnet_stats_copy_queue() accumulate
the values for per-CPU case but for global stats they overwrite
the value, so only stats from the last loop iteration / tc end up
in sch->[bq]stats.
Use the on-stack [bq]stats variables again and add the stats manually
in the global case.
Fixes: ce679e8df7ed2 ("net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
https://lore.kernel.org/all/20211007175000.2334713-2-bigeasy@linutronix.de/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
DSA bridge TX forwarding offload fixes - part 1
This is part 1 of a series of fixes to the bridge TX forwarding offload
feature introduced for v5.15. Sadly, the other fixes are so intrusive
that they cannot be reasonably be sent to the "net" tree, as they also
include API changes. So they are left as part 2 for net-next.
====================
Vladimir Oltean [Thu, 7 Oct 2021 16:47:11 +0000 (19:47 +0300)]
net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports
Similar to commit 6087175b7991 ("net: dsa: mt7530: use independent VLAN
learning on VLAN-unaware bridges"), software forwarding between an
unoffloaded LAG port (a bonding interface with an unsupported policy)
and a mv88e6xxx user port directly under a bridge is broken.
We adopt the same strategy, which is to make the standalone ports not
find any ATU entry learned on a bridge port.
Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
as many FIDs as VIDs (4096). The FID is derived from the VID when
possible (the VTU maps a VID to a FID), with a fallback to the port
based default FID value when not (802.1Q Mode is disabled on the port,
or the classified VID isn't present in the VTU).
The mv88e6xxx driver makes the following use of FIDs and VIDs:
- the port's DefaultVID (to which untagged & pvid-tagged packets get
classified) is 0 and is absent from the VTU, so this kind of packets is
processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.
- every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
mv88e6xxx_atu_new() associates a FID with that VID which increases
linearly starting from 1. Like this:
bridge vlan add dev lan0 vid 100 # FID 1
bridge vlan add dev lan1 vid 100 # still FID 1
bridge vlan add dev lan2 vid 1024 # FID 2
The FID allocation made by the driver is sub-optimal for the following
reasons:
(a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
of 0 too. The difference is that the bridged ports may learn ATU
entries, while the standalone port has the requirement that it must
not, and must not find them either. Standalone ports must not use
the same FID as ports belonging to a bridge. All standalone ports
can use the same FID, since the ATU will never have an entry in
that FID.
(b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
default FID of 0 on all their ports. The FDBs will not be isolated
between these bridges. Every VLAN-unaware bridge must use the same
FID on all its ports, different from the FID of other bridge ports.
(c) Each bridge VLAN uses a unique FID which is useful for Independent
VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
will result in the same FID being used by mv88e6xxx_atu_new().
The correct behavior is for VLAN 1 in br0 to have a different FID
compared to VLAN 1 in br1.
This patch cannot fix all the above. Traditionally the DSA framework did
not care about this, and the reality is that DSA core involvement is
needed for the aforementioned issues to be solved. The only thing we can
solve here is an issue which does not require API changes, and that is
issue (a), aka use a different FID for standalone ports vs ports under
VLAN-unaware bridges.
The first step is deciding what VID and FID to use for standalone ports,
and what VID and FID for bridged ports. The 0/0 pair for standalone
ports is what they used up till now, let's keep using that. For bridged
ports, there are 2 cases:
- VLAN-aware ports will never end up using the port default FID, because
packets will always be classified to a VID in the VTU or dropped
otherwise. The FID is the one associated with the VID in the VTU.
- On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
zero (just as in the case of standalone ports), and just change the
port's default FID from 0 to a different number (say 1).
However, Tobias points out that there is one more requirement to cater to:
cross-chip bridging. The Marvell DSA header does not carry the FID in
it, only the VID. So once a packet crosses a DSA link, if it has a VID
of zero it will get classified to the default FID of that cascade port.
Relying on a port default FID for upstream cascade ports results in
contradictions: a default FID of 0 breaks ATU isolation of bridged ports
on the downstream switch, a default FID of 1 breaks standalone ports on
the downstream switch.
So not only must standalone ports have different FIDs compared to
bridged ports, they must also have different DefaultVID values.
IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
bridges, and VID 4095 maps to FID 1.
For the xmit operation to look up the same ATU database, we need to put
VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
too. All shared ports are configured to map this VID to the bridging
FID, because they are members of that VLAN in the VTU. Shared ports
don't need to have 802.1QMode enabled in any way, they always parse the
VID from the DSA header, they don't need to look at the 802.1Q header.
We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
mention that mv88e6xxx_vtu_setup() which was located right below that
call was flushing the VTU so those entries wouldn't be preserved.
So we need to relocate the VTU flushing prior to the port initialization
during ->setup(). Also note that this is why it is safe to assume that
VID 4095 will get associated with FID 1: the user ports haven't been
created, so there is no avenue for the user to create a bridge VLAN
which could otherwise race with the creation of another FID which would
otherwise use up the non-reserved FID value of 1.
[ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]
mv88e6xxx_port_db_load_purge() is the function to access the ATU for
FDB/MDB entries, and it used to determine the FID to use for
VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
But the driver only called mv88e6xxx_port_set_fid() once, during probe,
so no surprises, the port FID was always 0, the call to get_fid() was
redundant. As much as I would have wanted to not touch that code, the
logic is broken when we add a new FID which is not the port-based
default. Now the port-based default FID only corresponds to standalone
ports, and FDB/MDB entries belong to the bridging service. So while in
the future, when the DSA API will support FDB isolation, we will have to
figure out the FID based on the bridge number, for now there's a single
bridging FID, so hardcode that.
Lastly, the tagger needs to check, when it is transmitting a VLAN
untagged skb, whether it is sending it towards a bridged or a standalone
port. When we see it is bridged we assume the bridge is VLAN-unaware.
Not because it cannot be VLAN-aware but:
- if we are transmitting from a VLAN-aware bridge we are likely doing so
using TX forwarding offload. That code path guarantees that skbs have
a vlan hwaccel tag in them, so we would not enter the "else" branch
of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.
- if we are transmitting on behalf of a VLAN-aware bridge but with no TX
forwarding offload (no PVT support, out of space in the PVT, whatever),
we would indeed be transmitting with VLAN 4095 instead of the bridge
device's pvid. However we would be injecting a "From CPU" frame, and
the switch won't learn from that - it only learns from "Forward" frames.
So it is inconsequential for address learning. And VLAN 4095 is
absolutely enough for the frame to exit the switch, since we never
remove that VLAN from any port.
Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support") Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Thu, 7 Oct 2021 16:47:10 +0000 (19:47 +0300)]
net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware
The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679ca2a
("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
some issues with VLAN and decided the best way to deal with them was to
make the DSA core ignore VLANs added by the bridge while VLAN awareness
is turned off. Those issues were never explained, just presented as
"at least one corner case".
That approach had problems of its own, presented by
commit 54a0ed0df496 ("net: dsa: provide an option for drivers to always
receive bridge VLANs") for the DSA core, followed by
commit 1fb74191988f ("net: dsa: mv88e6xxx: fix vlan setup") which
applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
particular.
We still don't know what corner case Andrew saw when he wrote
commit 2ea7a679ca2a ("net: dsa: Don't add vlans when vlan filtering is
disabled"), but Tobias now reports that when we use TX forwarding
offload, pinging an external station from the bridge device is broken if
the front-facing DSA user port has flooding turned off. The full
description is in the link below, but for short, when a mv88e6xxx port
is under a VLAN-unaware bridge, it inherits that bridge's pvid.
So packets ingressing a user port will be classified to e.g. VID 1
(assuming that value for the bridge_default_pvid), whereas when
tag_dsa.c xmits towards a user port, it always sends packets using a VID
of 0 if that port is standalone or under a VLAN-unaware bridge - or at
least it did so prior to commit d82f8ab0d874 ("net: dsa: tag_dsa:
offload the bridge forwarding process").
In any case, when there is a conversation between the CPU and a station
connected to a user port, the station's MAC address is learned in VID 1
but the CPU tries to transmit through VID 0. The packets reach the
intended station, but via flooding and not by virtue of matching the
existing ATU entry.
DSA has established (and enforced in other drivers: sja1105, felix,
mt7530) that a VLAN-unaware port should use a private pvid, and not
inherit the one from the bridge. The bridge's pvid should only be
inherited when that bridge is VLAN-aware, so all state transitions need
to be handled. On the other hand, all bridge VLANs should sit in the VTU
starting with the moment when the bridge offloads them via switchdev,
they are just not used.
This solves the problem that Tobias sees because packets ingressing on
VLAN-unaware user ports now get classified to VID 0, which is also the
VID used by tag_dsa.c on xmit.
Vladimir Oltean [Thu, 7 Oct 2021 16:47:09 +0000 (19:47 +0300)]
net: dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware bridges using VID 0
The present code is structured this way due to an incomplete thought
process. In Documentation/networking/switchdev.rst we document that if a
bridge is VLAN-unaware, then the presence or lack of a pvid on a bridge
port (or on the bridge itself, for that matter) should not affect the
ability to receive and transmit tagged or untagged packets.
If the bridge on behalf of which we are sending this packet is
VLAN-aware, then the TX forwarding offload API ensures that the skb will
be VLAN-tagged (if the packet was sent by user space as untagged, it
will get transmitted town to the driver as tagged with the bridge
device's pvid). But if the bridge is VLAN-unaware, it may or may not be
VLAN-tagged. In fact the logic to insert the bridge's PVID came from the
idea that we should emulate what is being done in the VLAN-aware case.
But we shouldn't.
It appears that injecting packets using a VLAN ID of 0 serves the
purpose of forwarding the packets to the egress port with no VLAN tag
added or stripped by the hardware, and no filtering being performed.
So we can simply remove the superfluous logic.
One reason why this logic is broken is that when CONFIG_BRIDGE_VLAN_FILTERING=n,
we call br_vlan_get_pvid_rcu() but that returns an error and we do error
out, dropping all packets on xmit. Not really smart. This is also an
issue when the user deletes the bridge pvid:
$ bridge vlan del dev br0 vid 1 self
As mentioned, in both cases, packets should still flow freely, and they
do just that on any net device where the bridge is not offloaded, but on
mv88e6xxx they don't.
Vladimir Oltean [Thu, 7 Oct 2021 16:47:08 +0000 (19:47 +0300)]
net: dsa: fix bridge_num not getting cleared after ports leaving the bridge
The dp->bridge_num is zero-based, with -1 being the encoding for an
invalid value. But dsa_bridge_num_put used to check for an invalid value
by comparing bridge_num with 0, which is of course incorrect.
The result is that the bridge_num will never get cleared by
dsa_bridge_num_put, and further port joins to other bridges will get a
bridge_num larger than the previous one, and once all the available
bridges with TX forwarding offload supported by the hardware get
exhausted, the TX forwarding offload feature is simply disabled.
In the case of sja1105, 7 iterations of the loop below are enough to
exhaust the TX forwarding offload bits, and further bridge joins operate
without that feature.
ip link add br0 type bridge vlan_filtering 1
while :; do
ip link set sw0p2 master br0 && sleep 1
ip link set sw0p2 nomaster && sleep 1
done
This issue is enough of an indication that having the dp->bridge_num
invalid encoding be a negative number is prone to bugs, so this will be
changed to a one-based value, with the dp->bridge_num of zero being the
indication of no bridge. However, that is material for net-next.
Fixes: f5e165e72b29 ("net: dsa: track unique bridge numbers across all DSA switch trees") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 8 Oct 2021 20:05:39 +0000 (13:05 -0700)]
Merge tag 'xtensa-20211008' of git://github.com/jcmvbkbc/linux-xtensa
Pull xtensa fixes from Max Filippov:
- fix build/boot issues caused by CONFIG_OF vs CONFIC_USE_OF usage
- fix reset handler for xtfpga boards
* tag 'xtensa-20211008' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: xtfpga: Try software restart before simulating CPU reset
xtensa: xtfpga: use CONFIG_USE_OF instead of CONFIG_OF
xtensa: call irqchip_init only when CONFIG_USE_OF is selected
xtensa: use CONFIG_USE_OF instead of CONFIG_OF
Linus Torvalds [Fri, 8 Oct 2021 19:55:23 +0000 (12:55 -0700)]
Merge tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- fix two minor issues in the Xen privcmd driver plus a cleanup patch
for that driver
- fix multiple issues related to running as PVH guest and some related
earlyprintk fixes for other Xen guest types
- fix an issue introduced in 5.15 the Xen balloon driver
* tag 'for-linus-5.15b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: fix cancelled balloon action
xen/x86: adjust data placement
x86/PVH: adjust function/data placement
xen/x86: hook up xen_banner() also for PVH
xen/x86: generalize preferred console model from PV to PVH Dom0
xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
xen/x86: allow "earlyprintk=xen" to work for PV Dom0
xen/x86: make "earlyprintk=xen" work better for PVH Dom0
xen/x86: allow PVH Dom0 without XEN_PV=y
xen/x86: prevent PVH type from getting clobbered
xen/privcmd: drop "pages" parameter from xen_remap_pfn()
xen/privcmd: fix error handling in mmap-resource processing
xen/privcmd: replace kcalloc() by kvcalloc() when allocating empty pages