Daniel Borkmann [Wed, 15 Dec 2021 22:02:19 +0000 (22:02 +0000)]
bpf: Fix signed bounds propagation after mov32
For the case where both s32_{min,max}_value bounds are positive, the
__reg_assign_32_into_64() directly propagates them to their 64 bit
counterparts, otherwise it pessimises them into [0,u32_max] universe and
tries to refine them later on by learning through the tnum as per comment
in mentioned function. However, that does not always happen, for example,
in mov32 operation we call zext_32_to_64(dst_reg) which invokes the
__reg_assign_32_into_64() as is without subsequent bounds update as
elsewhere thus no refinement based on tnum takes place.
Thus, not calling into the __update_reg_bounds() / __reg_deduce_bounds() /
__reg_bound_offset() triplet as we do, for example, in case of ALU ops via
adjust_scalar_min_max_vals(), will lead to more pessimistic bounds when
dumping the full register state:
Technically, the smin_value=0 and smax_value=4294967295 bounds are not
incorrect, but given the register is still a constant, they break assumptions
about const scalars that smin_value == smax_value and umin_value == umax_value.
Without the smin_value == smax_value and umin_value == umax_value invariant
being intact for const scalars, it is possible to leak out kernel pointers
from unprivileged user space if the latter is enabled. For example, when such
registers are involved in pointer arithmtics, then adjust_ptr_min_max_vals()
will taint the destination register into an unknown scalar, and the latter
can be exported and stored e.g. into a BPF map value.
Fixes: 576785048b47 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Kuee K1r0a <liulin063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org>
Daniel Borkmann [Mon, 13 Dec 2021 22:25:23 +0000 (22:25 +0000)]
bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer
Fix up unprivileged test case results for 'Dest pointer in r0' verifier tests
given they now need to reject R0 containing a pointer value, and add a couple
of new related ones with 32bit cmpxchg as well.
root@foo:~/bpf/tools/testing/selftests/bpf# ./test_verifier
#0/u invalid and of negative number OK
#0/p invalid and of negative number OK
[...]
#1268/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
#1269/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
#1270/p XDP pkt read, pkt_data <= pkt_meta', good access OK
#1271/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
#1272/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
Summary: 1900 PASSED, 0 SKIPPED, 0 FAILED
Given a BPF insn can only have two registers (dst, src), the R0 is fixed and
used as an auxilliary register for input (old value) as well as output (returning
old value from memory location). While the verifier performs a number of safety
checks, it misses to reject unprivileged programs where R0 contains a pointer as
old value.
Through brute-forcing it takes about ~16sec on my machine to leak a kernel pointer
with BPF_CMPXCHG. The PoC is basically probing for kernel addresses by storing the
guessed address into the map slot as a scalar, and using the map value pointer as
R0 while SRC_REG has a canary value to detect a matching address.
Fix it by checking R0 for pointers, and reject if that's the case for unprivileged
programs.
Daniel Borkmann [Tue, 7 Dec 2021 10:07:04 +0000 (10:07 +0000)]
bpf, selftests: Add test case for atomic fetch on spilled pointer
Test whether unprivileged would be able to leak the spilled pointer either
by exporting the returned value from the atomic{32,64} operation or by reading
and exporting the value from the stack after the atomic operation took place.
Note that for unprivileged, the below atomic cmpxchg test case named "Dest
pointer in r0 - succeed" is failing. The reason is that in the dst memory
location (r10 -8) there is the spilled register r10:
Daniel Borkmann [Tue, 7 Dec 2021 12:51:56 +0000 (12:51 +0000)]
bpf: Fix kernel address leakage in atomic fetch
The change in commit f33f93ce3900 ("bpf: Propagate stack bounds to registers
in atomics w/ BPF_FETCH") around check_mem_access() handling is buggy since
this would allow for unprivileged users to leak kernel pointers. For example,
an atomic fetch/and with -1 on a stack destination which holds a spilled
pointer will migrate the spilled register type into a scalar, which can then
be exported out of the program (since scalar != pointer) by dumping it into
a map value.
The original implementation of XADD was preventing this situation by using
a double call to check_mem_access() one with BPF_READ and a subsequent one
with BPF_WRITE, in both cases passing -1 as a placeholder value instead of
register as per XADD semantics since it didn't contain a value fetch. The
BPF_READ also included a check in check_stack_read_fixed_off() which rejects
the program if the stack slot is of __is_pointer_value() if dst_regno < 0.
The latter is to distinguish whether we're dealing with a regular stack spill/
fill or some arithmetical operation which is disallowed on non-scalars, see
also 071755280b8a ("bpf: Forbid XADD on spilled pointers for unprivileged
users") for more context on check_mem_access() and its handling of placeholder
value -1.
One minimally intrusive option to fix the leak is for the BPF_FETCH case to
initially check the BPF_READ case via check_mem_access() with -1 as register,
followed by the actual load case with non-negative load_reg to propagate
stack bounds to registers.
Fixes: f33f93ce3900 ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH") Reported-by: <n4ke4mry@gmail.com> Acked-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The commit referenced below added fixup_map_timer support (to create a
BPF map containing timers), but failed to increase the size of the
map_fds array, leading to out of bounds write. Fix this by changing
MAX_NR_MAPS to 22.
Magnus Karlsson [Tue, 14 Dec 2021 10:26:07 +0000 (11:26 +0100)]
xsk: Do not sleep in poll() when need_wakeup set
Do not sleep in poll() when the need_wakeup flag is set. When this
flag is set, the application needs to explicitly wake up the driver
with a syscall (poll, recvmsg, sendmsg, etc.) to guarantee that Rx
and/or Tx processing will be processed promptly. But the current code
in poll(), sleeps first then wakes up the driver. This means that no
driver processing will occur (baring any interrupts) until the timeout
has expired.
Fix this by checking the need_wakeup flag first and if set, wake the
driver and return to the application. Only if need_wakeup is not set
should the process sleep if there is a timeout set in the poll() call.
Fixes: c4f9ea350d2c ("xsk: add support for need_wakeup flag in AF_XDP rings") Reported-by: Keith Wiles <keith.wiles@intel.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20211214102607.7677-1-magnus.karlsson@gmail.com
Paul Chaignon [Thu, 9 Dec 2021 23:47:00 +0000 (00:47 +0100)]
selftests/bpf: Tests for state pruning with u32 spill/fill
This patch adds tests for the verifier's tracking for spilled, <8B
registers. The first two test cases ensure the verifier doesn't
incorrectly prune states in case of <8B spill/fills. The last one simply
checks that a filled u64 register is marked unknown if the register
spilled in the same slack slot was less than 8B.
The map value access at the end of the first program is only incorrect
for the path R6=32. If the precision bit for register R8 isn't
backtracked through the u32 spill/fill, the R6=32 path is pruned at
instruction 9 and the program is incorrectly accepted. The second
program is a variation of the same with u32 spills and a u64 fill.
The additional instructions to introduce the first pruning point may be
a bit fragile as they depend on the heuristics for pruning points in the
verifier (currently at least 8 instructions and 2 jumps). If the
heuristics are changed, the pruning point may move (e.g., to the
subsequent jump) or disappear, which would cause the test to always pass.
Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon [Thu, 9 Dec 2021 23:46:31 +0000 (00:46 +0100)]
bpf: Fix incorrect state pruning for <8B spill/fill
Commit 645b4075d71b ("bpf: Support <8-byte scalar spill and refill")
introduced support in the verifier to track <8B spill/fills of scalars.
The backtracking logic for the precision bit was however skipping
spill/fills of less than 8B. That could cause state pruning to consider
two states equivalent when they shouldn't be.
As an example, consider the following bytecode snippet:
The verifier first walks the path with R6=3. Given the support for <8B
spill/fills, at instruction 13, it knows the condition is true and skips
instruction 14. At that point, the backtracking logic kicks in but stops
at the fill instruction since it only propagates the precision bit for
8B spill/fill. When the verifier then walks the path with R6=2, it will
consider it safe at instruction 8 because R6 is not marked as needing
precision. Instruction 14 is thus never walked and is then incorrectly
removed as 'dead code'.
It's also possible to lead the verifier to accept e.g. an out-of-bound
memory access instead of causing an incorrect dead code elimination.
This regression was found via Cilium's bpf-next CI where it was causing
a conntrack map update to be silently skipped because the code had been
removed by the verifier.
This commit fixes it by enabling support for <8B spill/fills in the
bactracking logic. In case of a <8B spill/fill, the full 8B stack slot
will be marked as needing precision. Then, in __mark_chain_precision,
any tracked register spilled in a marked slot will itself be marked as
needing precision, regardless of the spill size. This logic makes two
assumptions: (1) only 8B-aligned spill/fill are tracked and (2) spilled
registers are only tracked if the spill and fill sizes are equal. Commit 314145503c12 ("bpf: selftest: Add verifier tests for <8-byte scalar
spill and refill") covers the first assumption and the next commit in
this patchset covers the second.
Fixes: 645b4075d71b ("bpf: Support <8-byte scalar spill and refill") Signed-off-by: Paul Chaignon <paul@isovalent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ping6 is failing as it should.
COMMAND: ip netns exec ns-A /bin/ping6 -c1 -w1 fe80::7c4c:bcff:fe66:a63a%red
strace of ping6 shows it is failing with '1',
so change the expected rc from 2 to 1.
Fixes: 1fdd5f0dd4ed ("selftests: Add ipv6 ping tests to fcnal-test") Reported-by: kernel test robot <lkp@intel.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com> Link: https://lore.kernel.org/r/20211209020230.37270-1-jie2x.zhou@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Thu, 9 Dec 2021 19:26:44 +0000 (11:26 -0800)]
Merge tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and netfilter.
Current release - regressions:
- bpf, sockmap: re-evaluate proto ops when psock is removed from
sockmap
Current release - new code bugs:
- bpf: fix bpf_check_mod_kfunc_call for built-in modules
- ice: fixes for TC classifier offloads
- vrf: don't run conntrack on vrf with !dflt qdisc
Previous releases - regressions:
- bpf: fix the off-by-two error in range markings
- seg6: fix the iif in the IPv6 socket control block
- devlink: fix netns refcount leak in devlink_nl_cmd_reload()
- dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
- dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
Previous releases - always broken:
- ethtool: do not perform operations on net devices being
unregistered
- udp: use datalen to cap max gso segments
- ice: fix races in stats collection
- fec: only clear interrupt of handling queue in fec_enet_rx_queue()
- m_can: pci: fix incorrect reference clock rate
- m_can: disable and ignore ELO interrupt
- mvpp2: fix XDP rx queues registering
Misc:
- treewide: add missing includes masked by cgroup -> bpf.h
dependency"
* tag 'net-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (82 commits)
net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
net: wwan: iosm: fixes unable to send AT command during mbim tx
net: wwan: iosm: fixes net interface nonfunctional after fw flash
net: wwan: iosm: fixes unnecessary doorbell send
net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
MAINTAINERS: s390/net: remove myself as maintainer
net/sched: fq_pie: prevent dismantle issue
net: mana: Fix memory leak in mana_hwc_create_wq
seg6: fix the iif in the IPv6 socket control block
nfp: Fix memory leak in nfp_cpp_area_cache_add()
nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
nfc: fix segfault in nfc_genl_dump_devices_done
udp: using datalen to cap max gso segments
net: dsa: mv88e6xxx: error handling for serdes_power functions
can: kvaser_usb: get CAN clock frequency from device
can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
net: mvpp2: fix XDP rx queues registering
vmxnet3: fix minimum vectors alloc issue
net, neigh: clear whole pneigh_entry at alloc time
net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
...
Linus Torvalds [Thu, 9 Dec 2021 19:08:19 +0000 (11:08 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fixes for various drivers which assume that a HID device is on USB
transport, but that might not necessarily be the case, as the device
can be faked by uhid. (Greg, Benjamin Tissoires)
- fix for spurious wakeups on certain Lenovo notebooks (Thomas
Weißschuh)
- a few other device-specific quirks
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: Ignore battery for Elan touchscreen on Asus UX550VE
HID: intel-ish-hid: ipc: only enable IRQ wakeup when requested
HID: google: add eel USB id
HID: add USB_HID dependancy to hid-prodikeys
HID: add USB_HID dependancy to hid-chicony
HID: bigbenff: prevent null pointer dereference
HID: sony: fix error path in probe
HID: add USB_HID dependancy on some USB HID drivers
HID: check for valid USB device for many HID drivers
HID: wacom: fix problems when device is not a valid USB device
HID: add hid_is_usb() function to make it simpler for USB detection
HID: quirks: Add quirk for the Microsoft Surface 3 type-cover
Linus Torvalds [Thu, 9 Dec 2021 18:49:36 +0000 (10:49 -0800)]
Merge tag 'netfs-fixes-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfslib fixes from David Howells:
- Fix a lockdep warning and potential deadlock. This is takes the
simple approach of offloading the write-to-cache done from within a
network filesystem read to a worker thread to avoid taking the
sb_writer lock from the cache backing filesystem whilst holding the
mmap lock on an inode from the network filesystem.
Jan Kara posits a scenario whereby this can cause deadlock[1], though
it's quite complex and I think requires someone in userspace to
actually do I/O on the cache files. Matthew Wilcox isn't so certain,
though[2].
An alternative way to fix this, suggested by Darrick Wong, might be
to allow cachefiles to prevent userspace from performing I/O upon the
file - something like an exclusive open - but that's beyond the scope
of a fix here if we do want to make such a facility in the future.
- In some of the error handling paths where netfs_ops->cleanup() is
called, the arguments are transposed[3]. gcc doesn't complain because
one of the parameters is void* and one of the values is void*.
Sasha Levin [Thu, 9 Dec 2021 16:51:13 +0000 (11:51 -0500)]
tools/lib/lockdep: drop leftover liblockdep headers
Clean up remaining headers that are specific to liblockdep but lived in
the shared header directory. These are all unused after the liblockdep
code was removed in commit 1de3bdde3dfd ("tools/lib/lockdep: drop
liblockdep").
Note that there are still headers that were originally created for
liblockdep, that still have liblockdep references, but they are used by
other tools/ code at this point.
net: dsa: mv88e6xxx: allow use of PHYs on CPU and DSA ports
Martyn Welch reports that his CPU port is unable to link where it has
been necessary to use one of the switch ports with an internal PHY for
the CPU port. The reason behind this is the port control register is
left forcing the link down, preventing traffic flow.
This occurs because during initialisation, phylink expects the link to
be down, and DSA forces the link down by synthesising a call to the
DSA drivers phylink_mac_link_down() method, but we don't touch the
forced-link state when we later reconfigure the port.
Resolve this by also unforcing the link state when we are operating in
PHY mode and the PPU is set to poll the PHY to retrieve link status
information.
Reported-by: Martyn Welch <martyn.welch@collabora.com> Tested-by: Martyn Welch <martyn.welch@collabora.com> Fixes: 0da96bd3f59c ("net: dsa: Down cpu/dsa ports phylink will control") Cc: <stable@vger.kernel.org> # 5.7: 4de91503014d: net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's" Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/E1mvFhP-00F8Zb-Ul@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 9 Dec 2021 16:10:38 +0000 (08:10 -0800)]
Merge branch 'net-wwan-iosm-bug-fixes'
M Chetan Kumar says:
====================
net: wwan: iosm: bug fixes
This patch series brings in IOSM driver bug fixes. Patch details are
explained below.
PATCH1: stop sending unnecessary doorbell in IP tx flow.
PATCH2: Restore the IP channel configuration after fw flash.
PATCH3: Removed the unnecessary check around control port TX transfer.
====================
M Chetan Kumar [Thu, 9 Dec 2021 10:16:28 +0000 (15:46 +0530)]
net: wwan: iosm: fixes net interface nonfunctional after fw flash
Devlink initialization flow was overwriting the IP traffic
channel configuration. This was causing wwan0 network interface
to be unusable after fw flash.
When device boots to fully functional mode restore the IP channel
configuration.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
M Chetan Kumar [Thu, 9 Dec 2021 10:16:27 +0000 (15:46 +0530)]
net: wwan: iosm: fixes unnecessary doorbell send
In TX packet accumulation flow transport layer is
giving a doorbell to device even though there is
no pending control TX transfer that needs immediate
attention.
Introduced a new hpda_ctrl_pending variable to keep
track of pending control TX transfer. If there is a
pending control TX transfer which needs an immediate
attention only then give a doorbell to device.
Signed-off-by: M Chetan Kumar <m.chetan.kumar@linux.intel.com> Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
José Expósito [Thu, 9 Dec 2021 11:05:40 +0000 (12:05 +0100)]
net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
Avoid a memory leak if there is not a CPU port defined.
Fixes: 4bdcb211bbc2 ("net: dsa: felix: break at first CPU port during init and teardown")
Addresses-Coverity-ID: 1492897 ("Resource leak")
Addresses-Coverity-ID: 1492899 ("Resource leak") Signed-off-by: José Expósito <jose.exposito89@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20211209110538.11585-1-jose.exposito89@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrea Mayer [Wed, 8 Dec 2021 19:54:09 +0000 (20:54 +0100)]
seg6: fix the iif in the IPv6 socket control block
When an IPv4 packet is received, the ip_rcv_core(...) sets the receiving
interface index into the IPv4 socket control block (v5.16-rc4,
net/ipv4/ip_input.c line 510):
IPCB(skb)->iif = skb->skb_iif;
If that IPv4 packet is meant to be encapsulated in an outer IPv6+SRH
header, the seg6_do_srh_encap(...) performs the required encapsulation.
In this case, the seg6_do_srh_encap function clears the IPv6 socket control
block (v5.16-rc4 net/ipv6/seg6_iptunnel.c line 163):
memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
The memset(...) was introduced in commit 8e948a9f62f4 ("ipv6: sr: clear
IP6CB(skb) on SRH ip4ip6 encapsulation") a long time ago (2019-01-29).
Since the IPv6 socket control block and the IPv4 socket control block share
the same memory area (skb->cb), the receiving interface index info is lost
(IP6CB(skb)->iif is set to zero).
As a side effect, that condition triggers a NULL pointer dereference if
commit 05017901c97a ("ipv6: When forwarding count rx stats on the orig
netdev") is applied.
To fix that issue, we set the IP6CB(skb)->iif with the index of the
receiving interface once again.
Fixes: 8e948a9f62f4 ("ipv6: sr: clear IP6CB(skb) on SRH ip4ip6 encapsulation") Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211208195409.12169-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jianglei Nie [Thu, 9 Dec 2021 06:15:11 +0000 (14:15 +0800)]
nfp: Fix memory leak in nfp_cpp_area_cache_add()
In line 800 (#1), nfp_cpp_area_alloc() allocates and initializes a
CPP area structure. But in line 807 (#2), when the cache is allocated
failed, this CPP area structure is not freed, which will result in
memory leak.
We can fix it by freeing the CPP area when the cache is allocated
failed (#2).
nfc: fix potential NULL pointer deref in nfc_genl_dump_ses_done
The done() netlink callback nfc_genl_dump_ses_done() should check if
received argument is non-NULL, because its allocation could fail earlier
in dumpit() (nfc_genl_dump_ses()).
Jakub Kicinski [Thu, 9 Dec 2021 15:43:22 +0000 (07:43 -0800)]
Merge tag 'linux-can-fixes-for-5.16-20211209' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
can 2021-12-09
Both patches are by Jimmy Assarsson. The first one fixes the
incrementing of the rx/tx error counters in the Kvaser PCIe FD driver.
The second one fixes the Kvaser USB driver by using the CAN clock
frequency provided by the device instead of using a hard coded value.
* tag 'linux-can-fixes-for-5.16-20211209' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: kvaser_usb: get CAN clock frequency from device
can: kvaser_pciefd: kvaser_pciefd_rx_error_frame(): increase correct stats->{rx,tx}_errors counter
====================
Jimmy Assarsson [Wed, 8 Dec 2021 15:21:22 +0000 (16:21 +0100)]
can: kvaser_usb: get CAN clock frequency from device
The CAN clock frequency is used when calculating the CAN bittiming
parameters. When wrong clock frequency is used, the device may end up
with wrong bittiming parameters, depending on user requested bittiming
parameters.
To avoid this, get the CAN clock frequency from the device. Various
existing Kvaser Leaf products use different CAN clocks.
Fixes: d001c8c6a4d1 ("can: kvaser_usb: Add support for Kvaser CAN/USB devices") Link: https://lore.kernel.org/all/20211208152122.250852-2-extja@kvaser.com Cc: stable@vger.kernel.org Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Louis Amas [Tue, 7 Dec 2021 14:34:22 +0000 (15:34 +0100)]
net: mvpp2: fix XDP rx queues registering
The registration of XDP queue information is incorrect because the
RX queue id we use is invalid. When port->id == 0 it appears to works
as expected yet it's no longer the case when port->id != 0.
The problem arised while using a recent kernel version on the
MACCHIATOBin. This board has several ports:
* eth0 and eth1 are 10Gbps interfaces ; both ports has port->id == 0;
* eth2 is a 1Gbps interface with port->id != 0.
Code from xdp-tutorial (more specifically advanced03-AF_XDP) was used
to test packet capture and injection on all these interfaces. The XDP
kernel was simplified to:
SEC("xdp_sock")
int xdp_sock_prog(struct xdp_md *ctx)
{
int index = ctx->rx_queue_index;
/* A set entry here means that the correspnding queue_id
* has an active AF_XDP socket bound to it. */
if (bpf_map_lookup_elem(&xsks_map, &index))
return bpf_redirect_map(&xsks_map, index, 0);
return XDP_PASS;
}
Starting the program using:
./af_xdp_user -d DEV
Gives the following result:
* eth0 : ok
* eth1 : ok
* eth2 : no capture, no injection
Investigating the issue shows that XDP rx queues for eth2 are wrong:
XDP expects their id to be in the range [0..3] but we found them to be
in the range [32..35].
Trying to force rx queue ids using:
./af_xdp_user -d eth2 -Q 32
fails as expected (we shall not have more than 4 queues).
When we register the XDP rx queue information (using
xdp_rxq_info_reg() in function mvpp2_rxq_init()) we tell it to use
rxq->id as the queue id. This value is computed as:
rxq->id = port->id * max_rxq_count + queue_id
where max_rxq_count depends on the device version. In the MACCHIATOBin
case, this value is 32, meaning that rx queues on eth2 are numbered
from 32 to 35 - there are four of them.
Clearly, this is not the per-port queue id that XDP is expecting:
it wants a value in the range [0..3]. It shall directly use queue_id
which is stored in rxq->logic_rxq -- so let's use that value instead.
rxq->id is left untouched ; its value is indeed valid but it should
not be used in this context.
This is consistent with the remaining part of the code in
mvpp2_rxq_init().
With this change, packet capture is working as expected on all the
MACCHIATOBin ports.
Fixes: 887e8381f684 ("mvpp2: use page_pool allocator") Signed-off-by: Louis Amas <louis.amas@eho.link> Signed-off-by: Emmanuel Deloget <emmanuel.deloget@eho.link> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20211207143423.916334-1-louis.amas@eho.link Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ronak Doshi [Tue, 7 Dec 2021 08:17:37 +0000 (00:17 -0800)]
vmxnet3: fix minimum vectors alloc issue
'Commit 4f9f3aa6b852 ("vmxnet3: add support for 32 Tx/Rx queues")'
added support for 32Tx/Rx queues. Within that patch, value of
VMXNET3_LINUX_MIN_MSIX_VECT was updated.
However, there is a case (numvcpus = 2) which actually requires 3
intrs which matches VMXNET3_LINUX_MIN_MSIX_VECT which then is
treated as failure by stack to allocate more vectors. This patch
fixes this issue.
Fixes: 4f9f3aa6b852 ("vmxnet3: add support for 32 Tx/Rx queues") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Link: https://lore.kernel.org/r/20211207081737.14000-1-doshir@vmware.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
CPU: 1 PID: 20001 Comm: syz-executor.0 Not tainted 5.16.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: ea59d98fdd3e ("[IPV6] NDISC: Set per-entry is_router flag in Proxy NA.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Roopa Prabhu <roopa@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211206165329.1049835-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1) Fix bogus compilter warning in nfnetlink_queue, from Florian Westphal.
2) Don't run conntrack on vrf with !dflt qdisc, from Nicolas Dichtel.
3) Fix nft_pipapo bucket load in AVX2 lookup routine for six 8-bit
groups, from Stefano Brivio.
4) Break rule evaluation on malformed TCP options.
5) Use socat instead of nc in selftests/netfilter/nft_zones_many.sh,
also from Florian
6) Fix KCSAN data-race in conntrack timeout updates, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: conntrack: annotate data-races around ct->timeout
selftests: netfilter: switch zone stress to socat
netfilter: nft_exthdr: break evaluation if setting TCP option fails
selftests: netfilter: Add correctness test for mac,net set type
nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
vrf: don't run conntrack on vrf with !dflt qdisc
netfilter: nfnetlink_queue: silence bogus compiler warning
====================
We've added 12 non-merge commits during the last 22 day(s) which contain
a total of 29 files changed, 659 insertions(+), 80 deletions(-).
The main changes are:
1) Fix an off-by-two error in packet range markings and also add a batch of
new tests for coverage of these corner cases, from Maxim Mikityanskiy.
2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.
3) Fix two functional regressions and a build warning related to BTF kfunc
for modules, from Kumar Kartikeya Dwivedi.
4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
PREEMPT_RT kernels, from Sebastian Andrzej Siewior.
5) Add missing includes in order to be able to detangle cgroup vs bpf header
dependencies, from Jakub Kicinski.
6) Fix regression in BPF sockmap tests caused by missing detachment of progs
from sockets when they are removed from the map, from John Fastabend.
7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
dispatcher, from Björn Töpel.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add selftests to cover packet access corner cases
bpf: Fix the off-by-two error in range markings
treewide: Add missing includes masked by cgroup -> bpf dependency
tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
bpf: Fix bpf_check_mod_kfunc_call for built-in modules
bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
mips, bpf: Fix reference to non-existing Kconfig symbol
bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
Documentation/locking/locktypes: Update migrate_disable() bits.
bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
bpf, sockmap: Attach map progs to psock early for feature probes
bpf, x86: Fix "no previous prototype" warning
====================
net: dsa: mv88e6xxx: fix "don't use PHY_DETECT on internal PHY's"
This commit fixes a misunderstanding in commit 06c5f7f5cb14 ("net: dsa:
mv88e6xxx: don't use PHY_DETECT on internal PHY's").
For Marvell DSA switches with the PHY_DETECT bit (for non-6250 family
devices), controls whether the PPU polls the PHY to retrieve the link,
speed, duplex and pause status to update the port configuration. This
applies for both internal and external PHYs.
For some switches such as 88E6352 and 88E6390X, PHY_DETECT has an
additional function of enabling auto-media mode between the internal
PHY and SERDES blocks depending on which first gains link.
The original intention of commit 83d5a3c13a1b (net: dsa: mv88e6xxx: use
PHY_DETECT in mac_link_up/mac_link_down) was to allow this bit to be
used to detect when this propagation is enabled, and allow software to
update the port configuration. This has found to be necessary for some
switches which do not automatically propagate status from the SERDES to
the port, which includes the 88E6390. However, commit 06c5f7f5cb14
("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") breaks
this assumption.
Maarten Zanders has confirmed that the issue he was addressing was for
an 88E6250 switch, which does not have a PHY_DETECT bit in bit 12, but
instead a link status bit. Therefore, mv88e6xxx_port_ppu_updates() does
not report correctly.
This patch resolves the above issues by reverting Maarten's change and
instead making mv88e6xxx_port_ppu_updates() indicate whether the port
is internal for the 88E6250 family of switches.
Yes, you're right, I'm targeting the 6250 family. And yes, your
suggestion would solve my case and is a better implementation for
the other devices (as far as I can see).
Fixes: 06c5f7f5cb14 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Maarten Zanders <maarten.zanders@mind.be> Link: https://lore.kernel.org/r/E1muXm7-00EwJB-7n@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jesse Brandeburg [Sat, 13 Nov 2021 01:06:02 +0000 (17:06 -0800)]
ice: safer stats processing
The driver was zeroing live stats that could be fetched by
ndo_get_stats64 at any time. This could result in inconsistent
statistics, and the telltale sign was when reading stats frequently from
/proc/net/dev, the stats would go backwards.
Fix by collecting stats into a local, and delaying when we write to the
structure so it's not incremental.
Fixes: 7c3e0ebc363c ("ice: Add stats and ethtool support") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Hans de Goede [Tue, 7 Dec 2021 12:10:53 +0000 (13:10 +0100)]
HID: Ignore battery for Elan touchscreen on Asus UX550VE
Battery status is reported for the Asus UX550VE touchscreen even though
it does not have a battery. Prevent it from always reporting the
battery as low.
bpf: Add selftests to cover packet access corner cases
This commit adds BPF verifier selftests that cover all corner cases by
packet boundary checks. Specifically, 8-byte packet reads are tested at
the beginning of data and at the beginning of data_meta, using all kinds
of boundary checks (all comparison operators: <, >, <=, >=; both
permutations of operands: data + length compared to end, end compared to
data + length). For each case there are three tests:
1. Length is just enough for an 8-byte read. Length is either 7 or 8,
depending on the comparison.
2. Length is increased by 1 - should still pass the verifier. These
cases are useful, because they failed before commit 31853f3d9196
("bpf: Fix the off-by-two error in range markings").
3. Length is decreased by 1 - should be rejected by the verifier.
Some existing tests are just renamed to avoid duplication.
Joakim Zhang [Mon, 6 Dec 2021 13:54:57 +0000 (21:54 +0800)]
net: fec: only clear interrupt of handling queue in fec_enet_rx_queue()
Background:
We have a customer is running a Profinet stack on the 8MM which receives and
responds PNIO packets every 4ms and PNIO-CM packets every 40ms. However, from
time to time the received PNIO-CM package is "stock" and is only handled when
receiving a new PNIO-CM or DCERPC-Ping packet (tcpdump shows the PNIO-CM and
the DCERPC-Ping packet at the same time but the PNIO-CM HW timestamp is from
the expected 40 ms and not the 2s delay of the DCERPC-Ping).
After debugging, we noticed PNIO, PNIO-CM and DCERPC-Ping packets would
be handled by different RX queues.
The root cause should be driver ack all queues' interrupt when handle a
specific queue in fec_enet_rx_queue(). The blamed patch is introduced to
receive as much packets as possible once to avoid interrupt flooding.
But it's unreasonable to clear other queues'interrupt when handling one
queue, this patch tries to fix it.
Fixes: eeae11157a75 (net: fec: clear receive interrupts before processing a packet) Cc: Russell King <rmk+kernel@arm.linux.org.uk> Reported-by: Nicolas Diaz <nicolas.diaz@nxp.com> Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com> Link: https://lore.kernel.org/r/20211206135457.15946-1-qiangqing.zhang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G W 5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: krdsd rds_send_worker
Note: I chose an arbitrary commit for the Fixes: tag,
because I do not think we need to backport this fix to very old kernels.
Fixes: 70518a997534 ("netfilter: conntrack: avoid possible false sharing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stefano Brivio [Sat, 27 Nov 2021 10:33:38 +0000 (11:33 +0100)]
selftests: netfilter: Add correctness test for mac,net set type
The existing net,mac test didn't cover the issue recently reported
by Nikita Yushchenko, where MAC addresses wouldn't match if given
as first field of a concatenated set with AVX2 and 8-bit groups,
because there's a different code path covering the lookup of six
8-bit groups (MAC addresses) if that's the first field.
Add a similar mac,net test, with MAC address and IPv4 address
swapped in the set specification.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stefano Brivio [Sat, 27 Nov 2021 10:33:37 +0000 (11:33 +0100)]
nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groups
The sixth byte of packet data has to be looked up in the sixth group,
not in the seventh one, even if we load the bucket data into ymm6
(and not ymm5, for convenience of tracking stalls).
Without this fix, matching on a MAC address as first field of a set,
if 8-bit groups are selected (due to a small set size) would fail,
that is, the given MAC address would never match.
Nicolas Dichtel [Fri, 26 Nov 2021 14:36:12 +0000 (15:36 +0100)]
vrf: don't run conntrack on vrf with !dflt qdisc
After the below patch, the conntrack attached to skb is set to "notrack" in
the context of vrf device, for locally generated packets.
But this is true only when the default qdisc is set to the vrf device. When
changing the qdisc, notrack is not set anymore.
In fact, there is a shortcut in the vrf driver, when the default qdisc is
set, see commit 9b4a779ec95d ("net: vrf: performance improvements for
IPv4") for more details.
This patch ensures that the behavior is always the same, whatever the qdisc
is.
To demonstrate the difference, a new test is added in conntrack_vrf.sh.
Fixes: 37196bb42752 ("vrf: run conntrack only in context of lower/physdev for locally generated packets") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 7 Dec 2021 23:36:45 +0000 (15:36 -0800)]
Merge tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix SMT detection fast read path on sysfs.
- Fix memory leaks when processing feature headers in perf.data files.
- Fix 'Simple expression parser' 'perf test' on arch without CPU die
topology info, such as s/390.
- Fix building perf with BUILD_BPF_SKEL=1.
- Fix 'perf bench' by reverting "perf bench: Fix two memory leaks
detected with ASan".
- Fix itrace space allowed for new attributes in 'perf script'.
- Fix the build feature detection fast path, that was always failing on
systems with python3 development packages, speeding up the build.
- Reset shadow counts before loading, fixing metrics using
duration_time.
- Sync more kernel headers changed by the new futex_waitv syscall: s390
and powerpc.
* tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf_skel: Do not use typedef to avoid error on old clang
perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
perf header: Fix memory leaks when processing feature headers
perf test: Reset shadow counts before loading
perf test: Fix 'Simple expression parser' test on arch without CPU die topology info
tools build: Remove needless libpython-version feature check that breaks test-all fast path
perf tools: Fix SMT detection fast read path
tools headers UAPI: Sync powerpc syscall table file changed by new futex_waitv syscall
perf inject: Fix itrace space allowed for new attributes
tools headers UAPI: Sync s390 syscall table file changed by new futex_waitv syscall
Revert "perf bench: Fix two memory leaks detected with ASan"
Adding filters with the same values inside for VXLAN and Geneve causes HW
error, because it looks exactly the same. To choose between different
type of tunnels new recipe is needed. Add storing tunnel types in
creating recipes function and start checking it in finding function.
Change getting open tunnels function to return port on correct tunnel
type. This is needed to copy correct port to dummy packet.
Block user from adding enc_dst_port via tc flower, because VXLAN and
Geneve filters can be created only with destination port which was
previously opened.
Fixes: d8d7095ca94a7 ("ice: low level support for tunnels") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
In tunnels packet there can be two UDP headers:
- outer which for hw should be mark as ICE_UDP_OF
- inner which for hw should be mark as ICE_UDP_ILOS or as ICE_TCP_IL if
inner header is of TCP type
In none tunnels packet header can be:
- UDP, which for hw should be mark as ICE_UDP_ILOS
- TCP, which for hw should be mark as ICE_TCP_IL
Change incorrect ICE_UDP_OF for none tunnel packets to ICE_UDP_ILOS.
ICE_UDP_OF is incorrect for none tunnel packets and setting it leads to
error from hw while adding this kind of recipe.
In summary, for tunnel outer port type should always be set to
ICE_UDP_OF, for none tunnel outer and tunnel inner it should always be
set to ICE_UDP_ILOS.
Fixes: 0be0edf75f72 ("ice: VXLAN and Geneve TC support") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Sat, 23 Oct 2021 00:28:17 +0000 (17:28 -0700)]
ice: ignore dropped packets during init
If the hardware is constantly receiving unicast or broadcast packets
during driver load, the device previously counted many GLV_RDPC (VSI
dropped packets) events during init. This causes confusing dropped
packet statistics during driver load. The dropped packets counter
incrementing does stop once the driver finishes loading.
Avoid this problem by baselining our statistics at the end of driver
open instead of the end of probe.
Fixes: 03e3b9ab0aad ("ice: Configure VSIs for Tx/Rx") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dave Ertman [Tue, 12 Oct 2021 20:31:21 +0000 (13:31 -0700)]
ice: Fix problems with DSCP QoS implementation
The patch that implemented DSCP QoS implementation removed a
bandwidth check that was used to check for a specific condition
caused by some corner cases. This check should not of been
removed.
The same patch also added a check for when the DCBx state could
be changed in relation to DSCP, but the check was erroneously
added nested in a check for CEE mode, which made the check useless.
Fix these problems by re-adding the bandwidth check and relocating
the DSCP mode check earlier in the function that changes DCBx state
in the driver.
Fixes: c56146c4c21c ("ice: Add DSCP support") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Paul Greenwalt [Mon, 12 Jul 2021 11:54:25 +0000 (07:54 -0400)]
ice: rearm other interrupt cause register after enabling VFs
The other interrupt cause register (OICR), global interrupt 0, is
disabled when enabling VFs to prevent handling VFLR. If the OICR is
not rearmed then the VF cannot communicate with the PF.
Rearm the OICR after enabling VFs.
Fixes: fcefa27d96fe ("ice: Separate VF VSI initialization/creation from reset flow") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Yahui Cao [Wed, 5 May 2021 21:18:00 +0000 (14:18 -0700)]
ice: fix FDIR init missing when reset VF
When VF is being reset, ice_reset_vf() will be called and FDIR
resource should be released and initialized again.
Fixes: 116686042684 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Yahui Cao <yahui.cao@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dan Carpenter [Tue, 7 Dec 2021 08:24:16 +0000 (11:24 +0300)]
net/qla3xxx: fix an error code in ql_adapter_up()
The ql_wait_for_drvr_lock() fails and returns false, then this
function should return an error code instead of returning success.
The other problem is that the success path prints an error message
netdev_err(ndev, "Releasing driver lock\n"); Delete that and
re-order the code a little to make it more clear.
Fixes: 37c471ba9b41 ("[PATCH] qla3xxx NIC driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211207082416.GA16110@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 7 Dec 2021 18:32:04 +0000 (10:32 -0800)]
Merge tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
can 2021-12-07
The 1st patch is by Vincent Mailhol and fixes a use after free in the
pch_can driver.
Dan Carpenter fixes a use after free in the ems_pcmcia sja1000 driver.
The remaining 7 patches target the m_can driver. Brian Silverman
contributes a patch to disable and ignore the ELO interrupt, which is
currently not handled in the driver and may lead to an interrupt
storm. Vincent Mailhol's patch fixes a memory leak in the error path
of the m_can_read_fifo() function. The remaining patches are
contributed by Matthias Schiffer, first a iomap_read_fifo() and
iomap_write_fifo() functions are fixed in the PCI glue driver, then
the clock rate for the Intel Ekhart Lake platform is fixed, the last 3
patches add support for the custom bit timings on the Elkhart Lake
platform.
* tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: m_can: pci: use custom bit timings for Elkhart Lake
can: m_can: make custom bittiming fields const
Revert "can: m_can: remove support for custom bit timing"
can: m_can: pci: fix incorrect reference clock rate
can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
can: m_can: m_can_read_fifo: fix memory leak in error branch
can: m_can: Disable and ignore ELO interrupt
can: sja1000: fix use after free in ems_pcmcia_add_card()
can: pch_can: pch_can_rx_normal: fix use after free
====================
Linus Torvalds [Tue, 7 Dec 2021 18:10:20 +0000 (10:10 -0800)]
Merge tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"Various bug-fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/intel: hid: add quirk to support Surface Go 3
platform/x86: amd-pmc: Fix s2idle failures on certain AMD laptops
platform/x86: touchscreen_dmi: Add TrekStor SurfTab duo W1 touchscreen info
platform/x86: lg-laptop: Recognize more models
platform/x86: thinkpad_acpi: Add lid_logo_dot to the list of safe LEDs
platform/x86: thinkpad_acpi: Restore missing hotkey_tablet_mode and hotkey_radio_sw sysfs-attr
Jeffle Xu [Tue, 7 Dec 2021 03:14:49 +0000 (11:14 +0800)]
netfs: fix parameter of cleanup()
The order of these two parameters is just reversed. gcc didn't warn on
that, probably because 'void *' can be converted from or to other
pointer types without warning.
Cc: stable@vger.kernel.org Fixes: f04d087afe94 ("netfs: Provide readahead and readpage netfs helpers") Fixes: 4784311c1fa0 ("netfs: Add write_begin helper") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/
David Howells [Tue, 7 Dec 2021 09:53:24 +0000 (09:53 +0000)]
netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lock
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in
a lockdep warning like that below. The problem comes from cachefiles
needing to take the sb_writers lock in order to do a write to the cache,
but being asked to do this by netfslib called from readpage, readahead or
write_begin[1].
Fix this by always offloading the write to the cache off to a worker
thread. The main thread doesn't need to wait for it, so deadlock can be
avoided.
This can be tested by running the quick xfstests on something like afs or
ceph with lockdep enabled.
WARNING: possible circular locking dependency detected
5.15.0-rc1-build2+ #292 Not tainted
------------------------------------------------------
holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5
but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
can: m_can: pci: use custom bit timings for Elkhart Lake
The relevant datasheet [1] specifies nonstandard limits for the bit timing
parameters. While it is unclear what the exact effect of violating these
limits is, it seems like a good idea to adhere to the documentation.
[1] Intel Atom® x6000E Series, and Intel® Pentium® and Celeron® N and J
Series Processors for IoT Applications Datasheet,
Volume 2 (Book 3 of 3), July 2021, Revision 001
When testing the CAN controller on our Ekhart Lake hardware, we
determined that all communication was running with twice the configured
bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed
this. Intel's support has confirmed to us that 200MHz is indeed the
correct clock rate.
can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()
The same fix that was previously done in m_can_platform in commit d5fc7a3b4452 ("can: m_can: fix iomap_read_fifo() and iomap_write_fifo()")
is required in m_can_pci as well to make iomap_read_fifo() and
iomap_write_fifo() work for val_count > 1.
Fixes: 3f58314dc5bd ("can: m_can: Batch FIFO writes during CAN transmit") Fixes: 1c5ca6a105f2 ("can: m_can: Batch FIFO reads during CAN receive") Link: https://lore.kernel.org/all/20211118144011.10921-1-matthias.schiffer@ew.tq-group.com Cc: stable@vger.kernel.org Cc: Matt Kline <matt@bitbashing.io> Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Vincent Mailhol [Sun, 7 Nov 2021 05:07:55 +0000 (14:07 +0900)]
can: m_can: m_can_read_fifo: fix memory leak in error branch
In m_can_read_fifo(), if the second call to m_can_fifo_read() fails,
the function jump to the out_fail label and returns without calling
m_can_receive_skb(). This means that the skb previously allocated by
alloc_can_skb() is not freed. In other terms, this is a memory leak.
This patch adds a goto label to destroy the skb if an error occurs.
Issue was found with GCC -fanalyzer, please follow the link below for
details.
Fixes: 81caecb72d42 ("can: m_can: Disable IRQs on FIFO bus errors") Link: https://lore.kernel.org/all/20211107050755.70655-1-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Cc: Matt Kline <matt@bitbashing.io> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Brian Silverman [Mon, 29 Nov 2021 22:26:28 +0000 (14:26 -0800)]
can: m_can: Disable and ignore ELO interrupt
With the design of this driver, this condition is often triggered.
However, the counter that this interrupt indicates an overflow is never
read either, so overflowing is harmless.
On my system, when a CAN bus starts flapping up and down, this locks up
the whole system with lots of interrupts and printks.
Specifically, this interrupt indicates the CEL field of ECR has
overflowed. All reads of ECR mask out CEL.
Vincent Mailhol [Tue, 23 Nov 2021 11:16:54 +0000 (20:16 +0900)]
can: pch_can: pch_can_rx_normal: fix use after free
After calling netif_receive_skb(skb), dereferencing skb is unsafe.
Especially, the can_frame cf which aliases skb memory is dereferenced
just after the call netif_receive_skb(skb).
Song Liu [Fri, 3 Dec 2021 23:14:41 +0000 (15:14 -0800)]
perf bpf_skel: Do not use typedef to avoid error on old clang
When building bpf_skel with clang-10, typedef causes confusions like:
libbpf: map 'prev_readings': unexpected def kind var.
Fix this by removing the typedef.
Fixes: 512547d66458c505 ("perf stat: Introduce 'bperf' to share hardware PMCs with BPF") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/BEF5C312-4331-4A60-AEC0-AD7617CB2BC4@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Song Liu [Fri, 3 Dec 2021 19:32:34 +0000 (19:32 +0000)]
perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros
Arnaldo reported that building all his containers with BUILD_BPF_SKEL=1
to then make this the default he found problems in some distros where
the system linux/bpf.h file was being used and lacked this:
util/bpf_skel/bperf_leader.bpf.c:13:20: error: use of undeclared identifier 'BPF_F_PRESERVE_ELEMS'
__uint(map_flags, BPF_F_PRESERVE_ELEMS);
So use instead the vmlinux.h file generated by bpftool from BTF info.
This fixed these as well, getting the build back working on debian:11,
debian:experimental and ubuntu:21.10:
In file included from In file included from util/bpf_skel/bperf_leader.bpf.cutil/bpf_skel/bpf_prog_profiler.bpf.c::33:
:
In file included from In file included from /usr/include/linux/bpf.h/usr/include/linux/bpf.h::1111:
:
/usr/include/linux/types.h/usr/include/linux/types.h::55::1010:: In file included from util/bpf_skel/bperf_follower.bpf.c:3fatal errorfatal error:
: : In file included from /usr/include/linux/bpf.h:'asm/types.h' file not found11'asm/types.h' file not found:
/usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found
#include <asm/types.h>#include <asm/types.h>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Song Liu <song@kernel.org> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/CF175681-8101-43D1-ABDB-449E644BE986@fb.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ian Rogers [Sun, 28 Nov 2021 08:58:10 +0000 (00:58 -0800)]
perf test: Reset shadow counts before loading
Otherwise load counting is an average. Without this change
duration_time in test_memory_bandwidth will alter its value if an
earlier test contains duration_time.
This patch fixes an issue that's introduced in the proposed patch:
https://lore.kernel.org/lkml/20211124015226.3317994-1-irogers@google.com/
in perf test "Parse and process metrics".
Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Clarke <pc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20211128085810.4027314-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
if (uname(&uts) < 0)
return false;
if (strncmp(uts.machine, "x86_64", 6))
return false;
....
}
which always returns false on s390. The caller build_cpu_topology()
checks has_die_topology() return value. On false the the struct
cpu_topology::die_cpu_list is not contructed and has zero entries. This
leads to the failing comparison: #num_dies >= #num_packages. s390 of
course has a positive number of packages.
Fix this and check if the function build_cpu_topology() did build up
a die_cpus_list. The number of entries in this list should be larger
than 0. If the number of list element is zero, the die_cpus_list has
not been created and the check in function test__expr():
# perf test -Fv 7
7: Simple expression parser :
--- start ---
division by zero
syntax error
---- end ----
Simple expression parser: Ok
#
Fixes: e896a2cb3d0f79b0 ("perf expr: Add metric literals for topology.") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20211129112339.3003036-1-tmricht@linux.ibm.com
[ Added comment in the added 'if (num_dies)' line about architectures not having die topology ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools build: Remove needless libpython-version feature check that breaks test-all fast path
Since f6cb9c5edd69e277 ("perf tools: Add Python 3 support") we don't use
the tools/build/feature/test-libpython-version.c version in any Makefile
feature check:
$ find tools/ -type f | xargs grep feature-libpython-version
$
The only place where this was used was removed in f6cb9c5edd69e277:
And nowadays we either build with PYTHON=python3 or just install the
python3 devel packages and perf will build against it.
But the leftover feature-libpython-version check made the fast path
feature detection to break in all cases except when python2 devel files
were installed:
As python3 is the norm these days, fix this by just removing the unused
feature-libpython-version feature check, making the test-all fast path
to work with the common case.
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
$ ldd ~/bin/perf | grep python
libpython3.9.so.1.0 => /lib64/libpython3.9.so.1.0 (0x00007f58800b0000)
$ cat /tmp/build/perf/feature/test-all.make.output
$
Reviewed-by: James Clark <james.clark@arm.com> Fixes: f6cb9c5edd69e277 ("perf tools: Add Python 3 support") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jaroslav Škarvada <jskarvad@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/YaYmeeC6CS2b8OSz@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com>, Cc: André Almeida <andrealmeid@collabora.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/YZ%2F1OU9mJuyS2HMa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Thu, 25 Nov 2021 07:14:57 +0000 (09:14 +0200)]
perf inject: Fix itrace space allowed for new attributes
The space allowed for new attributes can be too small if existing header
information is large. That can happen, for example, if there are very
many CPUs, due to having an event ID per CPU per event being stored in the
header information.
Fix by adding the existing header.data_offset. Also increase the extra
space allowed to 8KiB and align to a 4KiB boundary for neatness.
That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.
$ grep futex tools/perf/arch/s390/entry/syscalls/syscall.tbl
238 common futex sys_futex sys_futex_time32
422 32 futex_time64 - sys_futex
449 common futex_waitv sys_futex_waitv sys_futex_waitv
$
This addresses this perf build warnings:
Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Adrian Hunter <adrian.hunter@intel.com>, Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/lkml/YZ%2F2qRW%2FTScYTP1U@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRRRRRRR Ok
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRR FAILED!
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRR Ok
# perf test 91
91: perf stat --bpf-counters test :RRRRRRRRRRRRRRR Ok
yep, it seems the perf bench is broken so the counts won't correlated if
I revert this one:
393df64b8881 perf bench: Fix two memory leaks detected with ASan
it works for me again.. it seems to break -t option
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sohaib Mohamed <sohaib.amhmd@gmail.com> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/lkml/YZev7KClb%2Fud43Lc@krava/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Antoine Tenart [Fri, 3 Dec 2021 10:13:18 +0000 (11:13 +0100)]
ethtool: do not perform operations on net devices being unregistered
There is a short period between a net device starts to be unregistered
and when it is actually gone. In that time frame ethtool operations
could still be performed, which might end up in unwanted or undefined
behaviours[1].
Do not allow ethtool operations after a net device starts its
unregistration. This patch targets the netlink part as the ioctl one
isn't affected: the reference to the net device is taken and the
operation is executed within an rtnl lock section and the net device
won't be found after unregister.
[1] For example adding Tx queues after unregister ends up in NULL
pointer exceptions and UaFs, such as:
BUG: KASAN: use-after-free in kobject_get+0x14/0x90
Read of size 1 at addr ffff88801961248c by task ethtool/755
Makefile: Do not quote value for CONFIG_CC_IMPLICIT_FALLTHROUGH
Andreas reported that a specific build environment for an external
module, being a bit broken, does pass CC_IMPLICIT_FALLTHROUGH quoted as
argument to gcc, causing an error
gcc-11: error: "-Wimplicit-fallthrough=5": linker input file not found: No such file or directory
Until this is more generally fixed as outlined in [1], by fixing
scripts/link-vmlinux.sh, scripts/gen_autoksyms.sh, etc to not directly
include the include/config/auto.conf, and in a second step, change
Kconfig to generate the auto.conf without "", workaround the issue by
explicitly unquoting CC_IMPLICIT_FALLTHROUGH.
Linus Torvalds [Mon, 6 Dec 2021 18:46:20 +0000 (10:46 -0800)]
Merge tag 'docs-5.16-3' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"A few important documentation fixes, including breakage that comes
with v1.0 of the ReadTheDocs theme"
* tag 'docs-5.16-3' of git://git.lwn.net/linux:
Documentation: Add minimum pahole version
Documentation/process: fix self reference
docs: admin-guide/blockdev: Remove digraph of node-states
docs: conf.py: fix support for Readthedocs v 1.0.0