There will probably be more checks, some for nic flows, some for fdb
flows and some are shared checks. Split it for fdb and nic to avoid
the function getting too big.
The driver should add offloaded rules with either a fwd or drop action.
The check existed in parsing fdb flows but not when parsing nic flows.
Move the test into actions_match_supported() which is called for
checking nic flows and fdb flows.
We have CONFIG_FRAMEBUFFER_CONSOLE=y in the defconfigs, but that depends
on CONFIG_FB so it's not actually getting set. I'm assuming most users
on real systems want a framebuffer console, so this enables CONFIG_FB to
allow that to take effect.
The root cause is the size of BPF_PSEUDO_FUNC instruction increases
from 2 to 3 after the address of called bpf-function is settled and
there are two bpf-to-bpf calls in test_pkt_access. The generated
instructions are shown below:
This patch is to fix an out-of-bound access issue when jit-ing the
bpf_pseudo_func insn (i.e. ld_imm64 with src_reg == BPF_PSEUDO_FUNC)
In jit_subprog(), it currently reuses the subprog index cached in
insn[1].imm. This subprog index is an index into a few array related
to subprogs. For example, in jit_subprog(), it is an index to the newly
allocated 'struct bpf_prog **func' array.
The subprog index was cached in insn[1].imm after add_subprog(). However,
this could become outdated (and too big in this case) if some subprogs
are completely removed during dead code elimination (in
adjust_subprog_starts_after_remove). The cached index in insn[1].imm
is not updated accordingly and causing out-of-bound issue in the later
jit_subprog().
Unlike bpf_pseudo_'func' insn, the current bpf_pseudo_'call' insn
is handling the DCE properly by calling find_subprog(insn->imm) to
figure out the index instead of caching the subprog index.
The existing bpf_adj_branches() will adjust the insn->imm
whenever insn is added or removed.
Instead of having two ways handling subprog index,
this patch is to make bpf_pseudo_func works more like
bpf_pseudo_call.
First change is to stop caching the subprog index result
in insn[1].imm after add_subprog(). The verification
process will use find_subprog(insn->imm) to figure
out the subprog index.
Second change is in bpf_adj_branches() and have it to
adjust the insn->imm for the bpf_pseudo_func insn also
whenever insn is added or removed.
Third change is in jit_subprog(). Like the bpf_pseudo_call handling,
bpf_pseudo_func temporarily stores the find_subprog() result
in insn->off. It is fine because the prog's insn has been finalized
at this point. insn->off will be reset back to 0 later to avoid
confusing the userspace prog dump tool.
Fixes: 359bfbf9bf1e ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211106014014.651018-1-kafai@fb.com Cc: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a possible race enabling/disabling runtime-pm between
mt7921_pm_set() and mt7921_poll_rx() since mt7921_pm_wake_work()
always schedules rx-napi callback and it will trigger
mt7921_pm_power_save_work routine putting chip to in low-power state
during mt7921_pm_set processing.
Suggested-by: Deren Wu <deren.wu@mediatek.com> Tested-by: Deren Wu <deren.wu@mediatek.com> Fixes: 9fd845cbeade ("mt76: mt7921: introduce Runtime PM support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/0f3e075a2033dc05f09dab4059e5be8cbdccc239.1640094847.git.lorenzo@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Introduce mt7921_mcu_set_beacon_filter utility routine in order to
remove duplicated code for hw beacon filtering.
Move mt7921_pm_interface_iter in debugfs since it is just used there.
Make the following routine static:
- mt7921_pm_interface_iter
- mt7921_mcu_uni_bss_bcnft
- mt7921_mcu_set_bss_pm
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
The driver core sets struct device->driver before calling out
to the bus' probe() method, this leaves a window where an ACPI
notify may happen on the WMI object before the driver's
probe() method has completed running, causing e.g. the
driver's notify() callback to get called with drvdata
not yet being set leading to a NULL pointer deref.
At a check for this to the WMI core, ensuring that the notify()
callback is not called before the driver is ready.
Fixes: 3e97f9405d1c ("platform/x86: wmi: Incorporate acpi_install_notify_handler") Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20211128190031.405620-2-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
As it was reported and discussed in: https://lore.kernel.org/lkml/CAHk-=whF9F89vsfH8E9TGc0tZA-yhzi2Di8wOtquNB5vRkFX5w@mail.gmail.com/
This patch improves the stack space of qede_config_rx_mode() by
splitting filter_config() to 3 functions and removing the
union qed_filter_type_params.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Shai Malin <smalin@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Wakeup mhi is needed before pci_read/write only for QCA6390 and WCN6855. Since
wakeup & release mhi is enabled for all hardwares, below mhi assert is seen in
QCN9074 when doing 'rmmod ath11k_pci':
Kernel panic - not syncing: dev_wake != 0
CPU: 2 PID: 13535 Comm: procd Not tainted 4.4.60 #1
Hardware name: Generic DT based system
[<80316dac>] (unwind_backtrace) from [<80313700>] (show_stack+0x10/0x14)
[<80313700>] (show_stack) from [<805135dc>] (dump_stack+0x7c/0x9c)
[<805135dc>] (dump_stack) from [<8032136c>] (panic+0x84/0x1f8)
[<8032136c>] (panic) from [<80549b24>] (mhi_pm_disable_transition+0x3b8/0x5b8)
[<80549b24>] (mhi_pm_disable_transition) from [<80549ddc>] (mhi_power_down+0xb8/0x100)
[<80549ddc>] (mhi_power_down) from [<7f5242b0>] (ath11k_mhi_op_status_cb+0x284/0x3ac [ath11k_pci])
[E][__mhi_device_get_sync] Did not enter M0 state, cur_state:RESET pm_state:SHUTDOWN Process
[E][__mhi_device_get_sync] Did not enter M0 state, cur_state:RESET pm_state:SHUTDOWN Process
[E][__mhi_device_get_sync] Did not enter M0 state, cur_state:RESET pm_state:SHUTDOWN Process
[<7f5242b0>] (ath11k_mhi_op_status_cb [ath11k_pci]) from [<7f524878>] (ath11k_mhi_stop+0x10/0x20 [ath11k_pci])
[<7f524878>] (ath11k_mhi_stop [ath11k_pci]) from [<7f525b94>] (ath11k_pci_power_down+0x54/0x90 [ath11k_pci])
[<7f525b94>] (ath11k_pci_power_down [ath11k_pci]) from [<8056b2a8>] (pci_device_shutdown+0x30/0x44)
[<8056b2a8>] (pci_device_shutdown) from [<805cfa0c>] (device_shutdown+0x124/0x174)
[<805cfa0c>] (device_shutdown) from [<8033aaa4>] (kernel_restart+0xc/0x50)
[<8033aaa4>] (kernel_restart) from [<8033ada8>] (SyS_reboot+0x178/0x1ec)
[<8033ada8>] (SyS_reboot) from [<80301b80>] (ret_fast_syscall+0x0/0x34)
Hence, disable wakeup/release mhi using hw_param for other hardwares.
HyperFlash devices in Renesas SoCs use 2-bytes addressing, according
to HW manual paragraph 62.3.3 (which officially describes Serial Flash
access, but seems to be applicable to HyperFlash too). And 1-byte bus
read operations to 2-bytes unaligned addresses in external address space
read mode work incorrectly (returns the other byte from the same word).
Function memcpy_fromio(), used by the driver to read data from the bus,
in ARM64 architecture (to which Renesas cores belong) uses 8-bytes
bus accesses for appropriate aligned addresses, and 1-bytes accesses
for other addresses. This results in incorrect data read from HyperFlash
in unaligned cases.
This issue can be reproduced using something like the following commands
(where mtd1 is a parition on Hyperflash storage, defined properly
in a device tree):
[Incorrect read of the same fragment: see the difference at offsets 8-11]
root@rcar-gen3:~# dd if=/dev/mtd1 of=/tmp/zz bs=12 count=1
root@rcar-gen3:~# hexdump -C /tmp/zz 00000000 f4 03 00 aa f5 03 01 aa 03 03 aa aa |............| 0000000c
Fix this issue by creating a local replacement of the copying function,
that performs only properly aligned bus accesses, and is used for reading
from HyperFlash.
If the IR Toy is receiving IR while a transmit is done, it may end up
hanging. We can prevent this from happening by re-entering sample mode
just before issuing the transmit command.
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:
RP -- UP -- DP -- UP -- DP (hotplug) -- EP
pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
For a Switch, the following must cause a hot reset to be sent on all
Downstream Ports: [...]
* The Data Link Layer of the Upstream Port reporting DL_Down status.
In Switches that support Link speeds greater than 5.0 GT/s, the
Upstream Port must direct the LTSSM of each Downstream Port to the
Hot Reset state, but not hold the LTSSMs in that state. This permits
each Downstream Port to begin Link training immediately after its
hot reset completes. This behavior is recommended for all Switches.
* Receiving a hot reset on the Upstream Port.
Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":
If the Port is enabled for edge-triggered interrupt signaling using
MSI or MSI-X, an interrupt message must be sent every time the logical
AND of the following conditions transitions from FALSE to TRUE: [...]
* The Hot-Plug Interrupt Enable bit in the Slot Control register is
set to 1b.
* At least one hot-plug event status bit in the Slot Status register
and its associated enable bit in the Slot Control register are both
set to 1b.
Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space. Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.
Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.
Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.
Surprise removal during error recovery is detected through a Presence
Detect Changed event. The hotplug port is expected to not signal that
event as a result of a Hot Reset.
The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset(). So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.
[bhelgaas: drop PCI_ERROR_RECOVERY Kconfig, split pm_iter() rename to
preparatory patch] Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/ Fixes: c9252c575e52 ("PCI-Express AER implemetation: AER core and aerdriver") Link: https://lore.kernel.org/r/251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com> Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org # v2.6.19+: 1a95b59f8d60: PCI/portdrv: Report reset for frozen channel Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Not all machines have clflush, so don't go assuming they do.
Not really sure why the clflush is even here since hwsp
is supposed to get snooped I thought.
Although in my case we're talking about a i830 machine where
render/blitter snooping is definitely busted. But it might
work for the hswp perhaps. Haven't really reverse engineered
that one fully.
Cc: stable@vger.kernel.org Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Fixes: bd0777357174 ("drm/i915/gt: Track all timelines created using the HWSP") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211014090941.12159-2-ville.syrjala@linux.intel.com Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Pinned contexts, like the migrate contexts need reset after resume
since their context image may have been lost. Also the GuC needs to
register pinned contexts.
Add a list to struct intel_engine_cs where we add all pinned contexts on
creation, and traverse that list at resume time to reset the pinned
contexts.
This fixes the kms_pipe_crc_basic@suspend-read-crc-pipe-a selftest for now,
but proper LMEM backup / restore is needed for full suspend functionality.
However, note that even with full LMEM backup / restore it may be
desirable to keep the reset since backing up the migrate context images
must happen using memcpy() after the migrate context has become inactive,
and for performance- and other reasons we want to avoid memcpy() from
LMEM.
Also traverse the list at guc_init_lrc_mapping() calling
guc_kernel_context_pin() for the pinned contexts, like is already done
for the kernel context.
v2:
- Don't reset the contexts on each __engine_unpark() but rather at
resume time (Chris Wilson).
v3:
- Reset contexts in the engine sanitize callback. (Chris Wilson)
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Brost Matthew <matthew.brost@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210922062527.865433-6-thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
When a task is doing some modification to the chunk btree and it is not in
the context of a chunk allocation or a chunk removal, it can deadlock with
another task that is currently allocating a new data or metadata chunk.
These contexts are the following:
* When relocating a system chunk, when we need to COW the extent buffers
that belong to the chunk btree;
* When adding a new device (ioctl), where we need to add a new device item
to the chunk btree;
* When removing a device (ioctl), where we need to remove a device item
from the chunk btree;
* When resizing a device (ioctl), where we need to update a device item in
the chunk btree and may need to relocate a system chunk that lies beyond
the new device size when shrinking a device.
The problem happens due to a sequence of steps like the following:
1) Task A starts a data or metadata chunk allocation and it locks the
chunk mutex;
2) Task B is relocating a system chunk, and when it needs to COW an extent
buffer of the chunk btree, it has locked both that extent buffer as
well as its parent extent buffer;
3) Since there is not enough available system space, either because none
of the existing system block groups have enough free space or because
the only one with enough free space is in RO mode due to the relocation,
task B triggers a new system chunk allocation. It blocks when trying to
acquire the chunk mutex, currently held by task A;
4) Task A enters btrfs_chunk_alloc_add_chunk_item(), in order to insert
the new chunk item into the chunk btree and update the existing device
items there. But in order to do that, it has to lock the extent buffer
that task B locked at step 2, or its parent extent buffer, but task B
is waiting on the chunk mutex, which is currently locked by task A,
therefore resulting in a deadlock.
One example report when the deadlock happens with system chunk relocation:
So fix this by making sure that whenever we try to modify the chunk btree
and we are neither in a chunk allocation context nor in a chunk remove
context, we reserve system space before modifying the chunk btree.
Reported-by: Hao Sun <sunhao.th@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CACkBjsax51i4mu6C0C3vJqQN3NR_iVuucoeG3U1HXjrgzn5FFQ@mail.gmail.com/ Fixes: cc2386613a0192 ("btrfs: rework chunk allocation to avoid exhaustion of the system chunk array") CC: stable@vger.kernel.org # 5.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
in dma_buf_release, which could be triggered by user space closing the
dma-buf file description while there are outstanding fence callbacks
from dma_buf_poll.
Unless the controller is not responding at boot or after suspend/resume,
the driver never resets the controller on x86/ACPI platforms. The driver
still requesting the reset pin at probe() though in case it needs it.
Until now the driver has always requested the reset pin with GPIOD_IN
as type. The idea being to put the pin in high-impedance mode to save
power until the driver actually wants to issue a reset.
But this means that just requesting the pin can cause issues, since
requesting it in another mode then GPIOD_ASIS may cause the pinctrl
driver to touch the pin settings. We have already had issues before
due to a bug in the pinctrl-cherryview.c driver which has been fixed in
commit 92baa0b56b77 ("pinctrl: cherryview: Preserve
CHV_PADCTRL1_INVRXTX_TXDATA flag on GPIOs").
And now it turns out that requesting the reset-pin as GPIOD_IN also stops
the touchscreen from working on the GPD P2 max mini-laptop. The behavior
of putting the pin in high-impedance mode relies on there being some
external pull-up to keep it high and there seems to be no pull-up on the
GPD P2 max, causing things to break.
This commit fixes this by requesting the reset pin as is when using
the x86/ACPI code paths to lookup the GPIOs; and by not dropping it
back into input-mode in case the driver does end up issuing a reset
for error-recovery.
Refactor reset handling a bit, change the main reset handler
into a new goodix_reset_no_int_sync() helper and add a
goodix_reset() wrapper which calls goodix_int_sync()
separately.
Also push the dev_err() call on reset failure into the
goodix_reset_no_int_sync() and goodix_int_sync() functions,
so that we don't need to have separate dev_err() calls in
all their callers.
This is a preparation patch for adding support for controllers
without flash, which need to have their firmware uploaded and
need some other special handling too.
Reviewed-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210920150643.155872-4-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add a goodix.h header file, and move the register definitions,
and struct declarations there and add prototypes for various
helper functions.
This is a preparation patch for adding support for controllers
without flash, which need to have their firmware uploaded and
need some other special handling too.
Since MAINTAINERS needs updating because of this change anyways,
also add myself as co-maintainer.
Reviewed-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210920150643.155872-3-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Change the type of the goodix_i2c_write() len parameter to from 'unsigned'
to 'int' to avoid bare use of 'unsigned', changing it to 'int' makes
goodix_i2c_write()' prototype consistent with goodix_i2c_read().
Reviewed-by: Bastien Nocera <hadess@hadess.net> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210920150643.155872-2-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When creating a subvolume, at ioctl.c:create_subvol(), if we fail to
insert the root item for the new subvolume into the root tree, we can
trigger the following warning:
This is because we are calling btrfs_free_tree_block() on an extent
buffer that is dirty. Fix that by cleaning the extent buffer, with
btrfs_clean_tree_block(), before freeing it.
This was triggered by test case generic/475 from fstests.
Fixes: d39804a16fea12 ("btrfs: fix metadata extent leak after failure to create subvolume") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When creating a subvolume, at ioctl.c:create_subvol(), if we fail to
insert the new root's root item into the root tree, we are freeing the
metadata extent we reserved for the new root to prevent a metadata
extent leak, as we don't abort the transaction at that point (since
there is nothing at that point that is irreversible).
However we allocated the metadata extent for the new root which we are
creating for the new subvolume, so its delayed reference refers to the
ID of this new root. But when we free the metadata extent we pass the
root of the subvolume where the new subvolume is located to
btrfs_free_tree_block() - this is incorrect because this will generate
a delayed reference that refers to the ID of the parent subvolume's root,
and not to ID of the new root.
This results in a failure when running delayed references that leads to
a transaction abort and a trace like the following:
Fix this by passing the new subvolume's root ID to btrfs_free_tree_block().
This requires changing the root argument of btrfs_free_tree_block() from
struct btrfs_root * to a u64, since at this point during the subvolume
creation we have not yet created the struct btrfs_root for the new
subvolume, and btrfs_free_tree_block() only needs a root ID and nothing
else from a struct btrfs_root.
This was triggered by test case generic/475 from fstests.
Fixes: d39804a16fea12 ("btrfs: fix metadata extent leak after failure to create subvolume") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In order to make 'real_root' used only in ref-verify it's required to
have the necessary context to perform the same checks that this member
is used for. So add 'mod_root' which will contain the root on behalf of
which a delayed ref was created and a 'skip_group' parameter which
will contain callsite-specific override of skip_qgroup.
Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The user facing function used to allocate new chunks is
btrfs_chunk_alloc, unfortunately there is yet another similar sounding
function - btrfs_alloc_chunk. This creates confusion, especially since
the latter function can be considered "private" in the sense that it
implements the first stage of chunk creation and as such is called by
btrfs_chunk_alloc.
To avoid the awkwardness that comes with having similarly named but
distinctly different in their purpose function rename btrfs_alloc_chunk
to btrfs_create_chunk, given that the main purpose of this function is
to orchestrate the whole process of allocating a chunk - reserving space
into devices, deciding on characteristics of the stripe size and
creating the in-memory structures.
Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Add nft_set_pipapo_match_destroy() helper function to release the
elements in the lookup tables.
Stefano Brivio says: "We additionally look for elements pointers in the
cloned matching data if priv->dirty is set, because that means that
cloned data might point to additional elements we did not commit to the
working copy yet (such as the abort path case, but perhaps not limited
to it)."
Fixes: 81037b76e5aa ("nf_tables: Add set type for arbitrary concatenation of ranges") Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are UAF bugs caused by rose_t0timer_expiry(). The
root cause is that del_timer() could not stop the timer
handler that is running and there is no synchronization.
One of the race conditions is shown below:
The rose_neigh is deallocated in position [1] and use in
position [2].
The crash trace triggered by POC is like below:
BUG: KASAN: use-after-free in expire_timers+0x144/0x320
Write of size 8 at addr ffff888009b19658 by task swapper/0/0
...
Call Trace:
<IRQ>
dump_stack_lvl+0xbf/0xee
print_address_description+0x7b/0x440
print_report+0x101/0x230
? expire_timers+0x144/0x320
kasan_report+0xed/0x120
? expire_timers+0x144/0x320
expire_timers+0x144/0x320
__run_timers+0x3ff/0x4d0
run_timer_softirq+0x41/0x80
__do_softirq+0x233/0x544
...
This patch changes rose_stop_ftimer() and rose_stop_t0timer()
in rose_remove_neigh() to del_timer_sync() in order that the
timer handler could be finished before the resources such as
rose_neigh and so on are deallocated. As a result, the UAF
bugs could be mitigated.
Kuee reported a corner case where the tnum becomes constant after the call
to __reg_bound_offset(), but the register's bounds are not, that is, its
min bounds are still not equal to the register's max bounds.
This in turn allows to leak pointers through turning a pointer register as
is into an unknown scalar via adjust_ptr_min_max_vals().
What can be seen here is that R3=scalar(umin=32767,umax=32768,var_off=(0x7fff;
0x8000)) after the operation R3 += -32767 results in a 'malformed' constant, that
is, R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0)). Intersecting with var_off has
not been done at that point via __update_reg_bounds(), which would have improved
the umax to be equal to umin.
Refactor the tnum <> min/max bounds information flow into a reg_bounds_sync()
helper and use it consistently everywhere. After the fix, bounds have been
corrected to R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) and thus the register
is regarded as a 'proper' constant scalar of 0.
Kuee reported a quirk in the jmp32's jeq/jne simulation, namely that the
register value does not match expectations for the fall-through path. For
example:
As can be seen on line 5 for the branch fall-through path in R2 [*] is that
given condition w2 != 0x8 is false, verifier should conclude that r2 = 8 as
upper 32 bit are known to be zero. However, verifier incorrectly concludes
that r2 = 571 which is far off.
The problem is it only marks false{true}_reg as known in the switch for JE/NE
case, but at the end of the function, it uses {false,true}_{64,32}off to
update {false,true}_reg->var_off and they still hold the prior value of
{false,true}_reg->var_off before it got marked as known. The subsequent
__reg_combine_32_into_64() then propagates this old var_off and derives new
bounds. The information between min/max bounds on {false,true}_reg from
setting the register to known const combined with the {false,true}_reg->var_off
based on the old information then derives wrong register data.
Fix it by detangling the BPF_JEQ/BPF_JNE cases and updating relevant
{false,true}_{64,32}off tnums along with the register marking to known
constant.
The mcp251xfd compatible chips have an erratum ([1], [2]), where the
received CRC doesn't match the calculated CRC. In commit c9c84f308944 ("can: mcp251xfd: mcp251xfd_regmap_crc_read(): work
around broken CRC on TBC register") the following workaround was
implementierend.
- If a CRC read error on the TBC register is detected and the first
byte is 0x00 or 0x80, the most significant bit of the first byte is
flipped and the CRC is calculated again.
- If the CRC now matches, the _original_ data is passed to the reader.
For now we assume transferred data was OK.
New investigations and simulations indicate that the CRC send by the
device is calculated on correct data, and the data is incorrectly
received by the SPI host controller.
Use flipped instead of original data and update workaround description
in mcp251xfd_regmap_crc_read().
[1] mcp2517fd: DS80000792C: "Incorrect CRC for certain READ_CRC commands"
[2] mcp2518fd: DS80000789C: "Incorrect CRC for certain READ_CRC commands"
Link: https://lore.kernel.org/all/DM4PR11MB53901D49578FE265B239E55AFB7C9@DM4PR11MB5390.namprd11.prod.outlook.com Fixes: c9c84f308944 ("can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register") Cc: stable@vger.kernel.org Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
[mkl: split into 2 patches, update patch description and documentation] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The mcp251xfd compatible chips have an erratum ([1], [2]), where the
received CRC doesn't match the calculated CRC. In commit c9c84f308944 ("can: mcp251xfd: mcp251xfd_regmap_crc_read(): work
around broken CRC on TBC register") the following workaround was
implementierend.
- If a CRC read error on the TBC register is detected and the first
byte is 0x00 or 0x80, the most significant bit of the first byte is
flipped and the CRC is calculated again.
- If the CRC now matches, the _original_ data is passed to the reader.
For now we assume transferred data was OK.
Measurements on the mcp2517fd show that the workaround is applicable
not only of the lowest byte is 0x00 or 0x80, but also if 3 least
significant bits are set.
Update check on 1st data byte and workaround description accordingly.
[1] mcp2517fd: DS80000792C: "Incorrect CRC for certain READ_CRC commands"
[2] mcp2518fd: DS80000789C: "Incorrect CRC for certain READ_CRC commands"
Link: https://lore.kernel.org/all/DM4PR11MB53901D49578FE265B239E55AFB7C9@DM4PR11MB5390.namprd11.prod.outlook.com Fixes: c9c84f308944 ("can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register") Cc: stable@vger.kernel.org Reported-by: Pavel Modilaynen <pavel.modilaynen@volvocars.com> Signed-off-by: Thomas Kopp <thomas.kopp@microchip.com>
[mkl: split into 2 patches, update patch description and documentation] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 89c5b0e8093c ("can: m_can: fix periph RX path: use
rx-offload to ensure skbs are sent from softirq context") the RX path
for peripheral devices was switched to RX-offload.
Received CAN frames are pushed to RX-offload together with a
timestamp. RX-offload is designed to handle overflows of the timestamp
correctly, if 32 bit timestamps are provided.
The timestamps of m_can core are only 16 bits wide. So this patch
shifts them to full 32 bit before passing them to RX-offload.
Link: https://lore.kernel.org/all/20220612211410.4081390-1-mkl@pengutronix.de Fixes: 89c5b0e8093c ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context") Cc: <stable@vger.kernel.org> # 5.13 Cc: Torin Cooper-Bennun <torin@maxiluxsystems.com> Reviewed-by: Chandrasekar Ramakrishnan <rcsekar@samsung.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit d9679b438a3b ("can: m_can: m_can_chip_config(): enable and
configure internal timestamps") the timestamping in the m_can core
should be enabled. In peripheral mode, the RX'ed CAN frames, TX
compete frames and error events are sorted by the timestamp.
The above mentioned commit however forgot to enable the timestamping.
Add the missing bits to enable the timestamp counter to the write of
the Timestamp Counter Configuration register.
The gs_usb driver appears to suffer from a malady common to many USB
CAN adapter drivers in that it performs usb_alloc_coherent() to
allocate a number of USB request blocks (URBs) for RX, and then later
relies on usb_kill_anchored_urbs() to free them, but this doesn't
actually free them. As a result, this may be leaking DMA memory that's
been used by the driver.
This commit is an adaptation of the techniques found in the esd_usb2
driver where a similar design pattern led to a memory leak. It
explicitly frees the RX URBs and their DMA memory via a call to
usb_free_coherent(). Since the RX URBs were allocated in the
gs_can_open(), we remove them in gs_can_close() rather than in the
disconnect function as was done in esd_usb2.
For more information, see the 831210df3134 ("can: esd_usb2: fix memory
leak").
Link: https://lore.kernel.org/all/alpine.DEB.2.22.394.2206031547001.1630869@thelappy Fixes: ff8e02f02963 ("can: gs_usb: Added support for the GS_USB CAN devices") Cc: stable@vger.kernel.org Signed-off-by: Rhett Aultman <rhett.aultman@samsara.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 975b617b3758 ("can: bcm: delay release of struct bcm_op
after synchronize_rcu()") Thadeu Lima de Souza Cascardo introduced two
synchronize_rcu() calls in bcm_release() (only once at socket close)
and in bcm_delete_rx_op() (called on removal of each single bcm_op).
Unfortunately this slow removal of the bcm_op's affects user space
applications like cansniffer where the modification of a filter
removes 2048 bcm_op's which blocks the cansniffer application for
40(!) seconds.
In commit 0ec07a041cbb ("can: gw: use call_rcu() instead of costly
synchronize_rcu()") Eric Dumazet replaced the synchronize_rcu() calls
with several call_rcu()'s to safely remove the data structures after
the removal of CAN ID subscriptions with can_rx_unregister() calls.
This patch adopts Erics approach for the can-bcm which should be
applicable since the removal of tasklet_kill() in bcm_remove_op() and
the introduction of the HRTIMER_MODE_SOFT timer handling in Linux 5.4.
Fixes: 975b617b3758 ("can: bcm: delay release of struct bcm_op after synchronize_rcu()") # >= 5.4 Link: https://lore.kernel.org/all/20220520183239.19111-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Cc: Norbert Slusarek <nslusarek@gmx.net> Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The previous cleanup with devres may lead to the incorrect release
orders at the probe error handling due to the devres's nature. Until
we register the card, snd_card_free() has to be called at first for
releasing the stuff properly when the driver tries to manage and
release the stuff via card->private_free().
This patch fixes it by calling snd_card_free() manually on the error
from the probe callback.
Both Behringer UMC 202 HD and 404 HD need explicit quirks to enable
the implicit feedback mode and start the playback stream primarily.
The former seems fixing the stuttering and the latter is required for
a playback-only case.
Note that the "clock source 41 is not valid" error message still
appears even after this fix, but it should be only once at probe.
The reason of the error is still unknown, but this seems to be mostly
harmless as it's a one-off error and the driver retires the clock
setup and it succeeds afterwards.
It will break the bpf self-tests build with:
progs/timer_crash.c:8:19: error: field has incomplete type 'struct bpf_timer'
struct bpf_timer timer;
^
/home/ubuntu/linux/tools/testing/selftests/bpf/tools/include/bpf/bpf_helper_defs.h:39:8:
note: forward declaration of 'struct bpf_timer'
struct bpf_timer;
^
1 error generated.
This test can only be built with 5.17 and newer kernels.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This problem has been fixed on mainline by patch b630484fb760 ("mm: Use
multi-index entries in the page cache") since it deletes the related code.
Fixes: ce869ae5cf28 ("mm: add and use find_lock_entries") Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The fastpath in slab_alloc_node() assumes that c->slab is stable as long as
the TID stays the same. However, two places in __slab_alloc() currently
don't update the TID when deactivating the CPU slab.
If multiple operations race the right way, this could lead to an object
getting lost; or, in an even more unlikely situation, it could even lead to
an object being freed onto the wrong slab's freelist, messing up the
`inuse` counter and eventually causing a page to be freed to the page
allocator while it still contains slab objects.
(I haven't actually tested these cases though, this is just based on
looking at the code. Writing testcases for this stuff seems like it'd be
a pain...)
The race leading to state inconsistency is (all operations on the same CPU
and kmem_cache):
- task A: begin do_slab_free():
- read TID
- read pcpu freelist (==NULL)
- check `slab == c->slab` (true)
- [PREEMPT A->B]
- task B: begin slab_alloc_node():
- fastpath fails (`c->freelist` is NULL)
- enter __slab_alloc()
- slub_get_cpu_ptr() (disables preemption)
- enter ___slab_alloc()
- take local_lock_irqsave()
- read c->freelist as NULL
- get_freelist() returns NULL
- write `c->slab = NULL`
- drop local_unlock_irqrestore()
- goto new_slab
- slub_percpu_partial() is NULL
- get_partial() returns NULL
- slub_put_cpu_ptr() (enables preemption)
- [PREEMPT B->A]
- task A: finish do_slab_free():
- this_cpu_cmpxchg_double() succeeds()
- [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL]
From there, the object on c->freelist will get lost if task B is allowed to
continue from here: It will proceed to the retry_load_slab label,
set c->slab, then jump to load_freelist, which clobbers c->freelist.
But if we instead continue as follows, we get worse corruption:
- task A: run __slab_free() on object from other struct slab:
- CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial)
- task A: run slab_alloc_node() with NUMA node constraint:
- fastpath fails (c->slab is NULL)
- call __slab_alloc()
- slub_get_cpu_ptr() (disables preemption)
- enter ___slab_alloc()
- c->slab is NULL: goto new_slab
- slub_percpu_partial() is non-NULL
- set c->slab to slub_percpu_partial(c)
- [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects
from slab-2]
- goto redo
- node_match() fails
- goto deactivate_slab
- existing c->freelist is passed into deactivate_slab()
- inuse count of slab-1 is decremented to account for object from
slab-2
At this point, the inuse count of slab-1 is 1 lower than it should be.
This means that if we free all allocated objects in slab-1 except for one,
SLUB will think that slab-1 is completely unused, and may free its page,
leading to use-after-free.
Fixes: b8f9d45292f2e ("slub: Separate out kmem_cache_cpu processing from deactivate_slab") Fixes: 4d13caa725630 ("slub: fast release on full slab") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If platform_device_add() fails, it no need to call platform_device_del(), split
platform_device_unregister() into platform_device_del/put(), so platform_device_put()
can be called separately.
Fixes: 5c0abd28a4fe ("ibmaem: new driver for power/energy/temp meters in IBM System X hardware") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220701074153.4021556-1-yangyingliang@huawei.com Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, the response to the power cap command overwrites the
first eight bytes of the poll response, since the commands use
the same buffer. This means that user's get the wrong data between
the time of sending the power cap and the next poll response update.
Fix this by specifying a different buffer for the power cap command
response.
Checksumming of the request and sequence numbering is now done in the
OCC interface driver in order to keep unique sequence numbers. So
remove those in the hwmon driver. Also, add the command length to the
send_cmd function pointer, since the checksum must be placed in the
last two bytes of the command. The submit interface must receive the
exact size of the command - previously it could be rounded to the
nearest 8 bytes with no consequence.
Kernel uapi headers are supposed to use __[us]{8,16,32,64} types defined
by <linux/types.h> as opposed to 'uint32_t' and similar. See [1] for the
relevant discussion about this topic. In this particular case, the usage
of 'uint64_t' escaped headers_check as these macros are not being called
here. However, the following program triggers a compilation error:
#include <drm/drm_fourcc.h>
int main()
{
unsigned long x = AMD_FMT_MOD_CLEAR(RB);
return 0;
}
gcc error:
drm.c:5:27: error: ‘uint64_t’ undeclared (first use in this function)
5 | unsigned long x = AMD_FMT_MOD_CLEAR(RB);
| ^~~~~~~~~~~~~~~~~
This patch changes AMD_FMT_MOD_{SET,CLEAR} macros to use the correct
integer types, which fixes the above issue.
[1] https://lkml.org/lkml/2019/6/5/18
Fixes: 12f6834d9e40 ("drm/fourcc: Add AMD DRM modifiers.") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Simon Ser <contact@emersion.fr> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On some Panasonic models the volume up/down/mute keypresses get
reported both through the Panasonic ACPI HKEY interface as well as
through the atkbd device.
Filter out the atkbd scan-codes for these to avoid reporting presses
twice.
Note normally we would leave the filtering of these to userspace by mapping
the scan-codes to KEY_UNKNOWN through /lib/udev/hwdb.d/60-keyboard.hwdb.
However in this case that would cause regressions since we were filtering
the Panasonic ACPI HKEY events before, so filter these in the kernel.
Fixes: 4ae3aac6548a ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug") Reported-and-tested-by: Stefan Seyfried <seife+kernel@b1-systems.com> Reported-and-tested-by: Kenneth Chan <kenneth.t.chan@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220624112340.10130-7-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The brightness key-presses might also get reported by the ACPI video bus,
check for this and in this case don't report the presses to avoid reporting
2 presses for a single key-press.
Fixes: 4ae3aac6548a ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug") Reported-and-tested-by: Stefan Seyfried <seife+kernel@b1-systems.com> Reported-and-tested-by: Kenneth Chan <kenneth.t.chan@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220624112340.10130-6-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In hindsight blindly throwing away most of the key-press events is not
a good idea. So revert commit 4ae3aac6548a ("platform/x86:
panasonic-laptop: Resolve hotkey double trigger bug").
Fixes: 4ae3aac6548a ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug") Reported-and-tested-by: Stefan Seyfried <seife+kernel@b1-systems.com> Reported-and-tested-by: Kenneth Chan <kenneth.t.chan@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220624112340.10130-5-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In the definition of panasonic_keymap[] the key codes are given in
decimal, later checks are done with hexadecimal values, which does
not help in understanding the code.
Additionally use two helper variables to shorten the code and make
the logic more obvious.
Fixes: 4ae3aac6548a ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug") Signed-off-by: Stefan Seyfried <seife+kernel@b1-systems.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220624112340.10130-3-hdegoede@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org>
In qoriq_cpufreq_probe(), of_find_matching_node() will return a
node pointer with refcount incremented. We should use of_node_put()
when it is not used anymore.
Fixes: 5c1f0af85f55 ("cpufreq: qoriq: convert to a platform driver")
[ Viresh: Fixed Author's name in commit log ] Signed-off-by: Liang He <windhl@126.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Set and increment the sequence number during the submit operation.
This prevents sequence number conflicts between different users of
the interface. A sequence number conflict may result in a user
getting an OCC response meant for a different command. Since the
sequence number is now modified, the checksum must be calculated and
set before submitting the command.
clocksource/drivers/ixp4xx: remove EXPORT_SYMBOL_GPL from ixp4xx_timer_setup()
ixp4xx_timer_setup is exported, and so can not be an __init function.
But it does not need to be exported as it is only called from one
in-kernel function, so just remove the EXPORT_SYMBOL_GPL() marking to
resolve the build warning.
This is fixed "properly" in commit 36d38d3ed0b8
("clocksource/drivers/ixp4xx: Drop boardfile probe path") but that can
not be backported to older kernels as the reworking of the IXP4xx
codebase is not suitable for stable releases.
During the PV driver life cycle the mappings are added to
the RB-tree by set_foreign_p2m_mapping(), which is called from
gnttab_map_refs() and are removed by clear_foreign_p2m_mapping()
which is called from gnttab_unmap_refs(). As both functions end
up calling __set_phys_to_machine_multi() which updates the RB-tree,
this function can be called concurrently.
There is already a "p2m_lock" to protect against concurrent accesses,
but the problem is that the first read of "phys_to_mach.rb_node"
in __set_phys_to_machine_multi() is not covered by it, so this might
lead to the incorrect mappings update (removing in our case) in RB-tree.
In my environment the related issue happens rarely and only when
PV net backend is running, the xen_add_phys_to_mach_entry() claims
that it cannot add new pfn <-> mfn mapping to the tree since it is
already exists which results in a failure when mapping foreign pages.
But there might be other bad consequences related to the non-protected
root reads such use-after-free, etc.
While at it, also fix the similar usage in __pfn_to_mfn(), so
initialize "struct rb_node *n" with the "p2m_lock" held in both
functions to avoid possible bad consequences.
The commit referenced below moved the invocation past the "next" label,
without any explanation. In fact this allows misbehaving backends undue
control over the domain the frontend runs in, as earlier detected errors
require the skb to not be freed (it may be retained for later processing
via xennet_move_rx_slot(), or it may simply be unsafe to have it freed).
This is CVE-2022-33743 / XSA-405.
Fixes: 9bf1756dc970 ("xen networking: add basic XDP support for xen-netfront") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Split the current bounce buffering logic used with persistent grants
into it's own option, and allow enabling it independently of
persistent grants. This allows to reuse the same code paths to
perform the bounce buffering required to avoid leaking contiguous data
in shared pages not part of the request fragments.
Reporting whether the backend is to be trusted can be done using a
module parameter, or from the xenstore frontend path as set by the
toolstack when adding the device.
Bounce all data on the skbs to be transmitted into zeroed pages if the
backend is untrusted. This avoids leaking data present in the pages
shared with the backend but not part of the skb fragments. This
requires introducing a new helper in order to allocate skbs with a
size multiple of XEN_PAGE_SIZE so we don't leak contiguous data on the
granted pages.
Reporting whether the backend is to be trusted can be done using a
module parameter, or from the xenstore frontend path as set by the
toolstack when adding the device.
Rather than use rseq_get_abi() and pass its result through a register to
the inline assembler, directly access the per-thread rseq area through a
memory reference combining the %gs segment selector, the constant offset
of the field in struct rseq, and the rseq_offset value (in a register).
Rather than use rseq_get_abi() and pass its result through a register to
the inline assembler, directly access the per-thread rseq area through a
memory reference combining the %fs segment selector, the constant offset
of the field in struct rseq, and the rseq_offset value (in a register).
gcc and clang each have their own compiler bugs with respect to asm
goto. Implement a work-around for compiler versions known to have those
bugs.
gcc prior to 4.8.2 miscompiles asm goto.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
gcc prior to 8.1.0 miscompiles asm goto at O1.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103908
clang prior to version 13.0.1 miscompiles asm goto at O2.
https://github.com/llvm/llvm-project/issues/52735
Work around these issues by adding a volatile inline asm with
memory clobber in the fallthrough after the asm goto and at each
label target. Emit this for all compilers in case other similar
issues are found in the future.
The arm and mips work-around for asm goto size guess issues are not
properly documented, and lack reference to specific compiler versions,
upstream compiler bug tracker entry, and reproducer.
I can only find a loosely documented patch in my original LKML rseq post
refering to gcc < 7 on ARM, but it does not appear to be sufficient to
track the exact issue. Also, I am not sure MIPS really has the same
limitation.
Therefore, remove the work-around until we can properly document this.
The semantic of off_t is for file offsets. We mean to use it as an
offset from a pointer. We really expect it to fit in a single register,
and not use a 64-bit type on 32-bit architectures.
Fix runtime issues on ppc32 where the offset is always 0 due to
inconsistency between the argument type (off_t -> 64-bit) and type
expected by the inline assembler (32-bit).
Building the rseq basic test with
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12)
Target: powerpc-linux-gnu
leads to these errors:
/tmp/ccieEWxU.s: Assembler messages:
/tmp/ccieEWxU.s:118: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:118: Error: junk at end of line: `,8'
/tmp/ccieEWxU.s:121: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:121: Error: junk at end of line: `,8'
/tmp/ccieEWxU.s:626: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:626: Error: junk at end of line: `,8'
/tmp/ccieEWxU.s:629: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:629: Error: junk at end of line: `,8'
/tmp/ccieEWxU.s:735: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:735: Error: junk at end of line: `,8'
/tmp/ccieEWxU.s:738: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:738: Error: junk at end of line: `,8'
/tmp/ccieEWxU.s:741: Error: syntax error; found `,', expected `('
/tmp/ccieEWxU.s:741: Error: junk at end of line: `,8'
Makefile:581: recipe for target 'basic_percpu_ops_test.o' failed
Based on discussion with Linux powerpc maintainers and review of
the use of the "m" operand in powerpc kernel code, add the missing
%Un%Xn (where n is operand number) to the lwz, stw, ld, and std
instructions when used with "m" operands.
Using "WORD" to mean either a 32-bit or 64-bit type depending on
the architecture is misleading. The term "WORD" really means a
32-bit type in both 32-bit and 64-bit powerpc assembler. The intent
here is to wrap load/store to intptr_t into common macros for both
32-bit and 64-bit.
Rename the macros with a RSEQ_ prefix, and use the terms "INT"
for always 32-bit type, and "LONG" for architecture bitness-sized
type.
glibc-2.35 (upcoming release date 2022-02-01) exposes the rseq per-thread
data in the TCB, accessible at an offset from the thread pointer, rather
than through an actual Thread-Local Storage (TLS) variable, as the
Linux kernel selftests initially expected.
The __rseq_abi TLS and glibc-2.35's ABI for per-thread data cannot
actively coexist in a process, because the kernel supports only a single
rseq registration per thread.
Here is the scheme introduced to ensure selftests can work both with an
older glibc and with glibc-2.35+:
- librseq exposes its own "rseq_offset, rseq_size, rseq_flags" ABI.
- librseq queries for glibc rseq ABI (__rseq_offset, __rseq_size,
__rseq_flags) using dlsym() in a librseq library constructor. If those
are found, copy their values into rseq_offset, rseq_size, and
rseq_flags.
- Else, if those glibc symbols are not found, handle rseq registration
from librseq and use its own IE-model TLS to implement the rseq ABI
per-thread storage.
This is done in preparation for the selftest uplift to become compatible
with glibc-2.35.
glibc-2.35 exposes the rseq per-thread data in the TCB, accessible
at an offset from the thread pointer.
The toolchains do not implement accessing the thread pointer on all
architectures. Provide thread pointer getters for ppc and x86 which
lack (or lacked until recently) toolchain support.
This is done in preparation for the selftest uplift to become compatible
with glibc-2.35.
glibc-2.35 exposes the rseq per-thread data in the TCB, accessible
at an offset from the thread pointer, rather than through an actual
Thread-Local Storage (TLS) variable, as the kernel selftests initially
expected.
Introduce a rseq_get_abi() helper, initially using the __rseq_abi
TLS variable, in preparation for changing this userspace ABI for one
which is compatible with glibc-2.35.
Note that the __rseq_abi TLS and glibc-2.35's ABI for per-thread data
cannot actively coexist in a process, because the kernel supports only
a single rseq registration per thread.
This is done in preparation for the selftest uplift to become compatible
with glibc-2.35.
All accesses to the __rseq_abi fields are volatile, but remove the
volatile from the TLS variable declaration, otherwise we are stuck with
volatile for the upcoming rseq_get_abi() helper.
The Linux kernel rseq uapi header has a broken layout for the
rseq_cs.ptr field on 32-bit little endian architectures. The entire
rseq_cs.ptr field is planned for removal, leaving only the 64-bit
rseq_cs.ptr64 field available.
Both glibc and librseq use their own copy of the Linux kernel uapi
header, where they introduce proper union fields to access to the 32-bit
low order bits of the rseq_cs pointer on 32-bit architectures.
Introduce a copy of the Linux kernel uapi headers in the Linux kernel
selftests.
ARRAY_SIZE is defined in several selftests. Remove definitions from
individual test files and include header file for the define instead.
ARRAY_SIZE define is added in a separate patch to prepare for this
change.
Remove ARRAY_SIZE from rseq tests and pickup the one defined in
kselftest.h.
This allows us to add tests (esp. negative tests) where we only want to
ensure the program doesn't pass through the verifier, and also verify
the error. The next commit will add the tests making use of this.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220114163953.1455836-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[PHLin: backport due to lack of fixup_map_timer] Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the third packet of 3WHS connection establishment
contains payload, it is added into socket receive queue
without the XFRM check and the drop of connection tracking
context.
This means that if the data is left unread in the socket
receive queue, conntrack module can not be unloaded.
As most applications usually reads the incoming data
immediately after accept(), bug has been hiding for
quite a long time.
Commit b846c181f3b2 ("net: generalize skb freeing
deferral to per-cpu lists") exposed this bug because
even if the application reads this data, the skb
with nfct state could stay in a per-cpu cache for
an arbitrary time, if said cpu no longer process RX softirqs.
Many thanks to Ilya Maximets for reporting this issue,
and for testing various patches:
https://lore.kernel.org/netdev/20220619003919.394622-1-i.maximets@ovn.org/
Note that I also added a missing xfrm4_policy_check() call,
although this is probably not a big issue, as the SYN
packet should have been dropped earlier.
Leah Rumancik [Thu, 30 Jun 2022 20:52:28 +0000 (13:52 -0700)]
MAINTAINERS: add Leah as xfs maintainer for 5.15.y
Update MAINTAINERS for xfs in an effort to help direct bots/questions
about xfs in 5.15.y.
Note: 5.10.y and 5.4.y will have different updates to their
respective MAINTAINERS files for this effort.
Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Eric reports that syzbot made short work out of my speculative
fix. Indeed when queue gets detached its tfile->tun remains,
so we would try to stop NAPI twice with a detach(), close()
sequence.
Alternative fix would be to move tun_napi_disable() to
tun_detach_all() and let the NAPI run after the queue
has been detached.
Fixes: 6760cd5a7546 ("net: tun: stop NAPI when detaching queues") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220629181911.372047-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In mlxsw_sp_nexthop6_init(), a next hop is always added to the router
linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When
that function results in an error, the next hop will not have been removed
from the linked list. As the error is propagated upwards and the caller
frees the next hop object, the linked list ends up holding an invalid
object.
A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback
block does exist, however does not include the linked list removal.
Both IPv6 and IPv4 next hops have a similar issue with next-hop counter
rollbacks. As these were introduced in the same patchset as the next hop
linked list, include the cleanup in this patch.
Fixes: 4e4a8cbe8a97 ("mlxsw: spectrum_router: Keep nexthops in a linked list") Fixes: 57b8fd70b9c8 ("mlxsw: spectrum: Add support for setting counters on nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Recently added debug in commit 3c8fe2678377 ("net: warn if mac header
was not set") caught a bug in skb_tunnel_check_pmtu(), as shown
in this syzbot report [1].
In ndo_start_xmit() paths, there is really no need to use skb->mac_header,
because skb->data is supposed to point at it.
Fixes: 6aabe1545775 ("tunnels: PMTU discovery support for directly bridged IP packets") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some systems have an ACPI video bus but not ACPI video devices with
backlight capability. On these devices brightness key-presses are
(logically) not reported through the ACPI video bus.
Change how acpi_video_handles_brightness_key_presses() determines if
brightness key-presses are handled by the ACPI video driver to avoid
vendor specific drivers/platform/x86 drivers filtering out their
brightness key-presses even though they are the only ones reporting
these presses.
Fixes: 4ae3aac6548a ("platform/x86: panasonic-laptop: Resolve hotkey double trigger bug") Reported-and-tested-by: Stefan Seyfried <seife+kernel@b1-systems.com> Reported-and-tested-by: Kenneth Chan <kenneth.t.chan@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220624112340.10130-2-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>