The dwc3 driver manages its PHYs itself so the USB core PHY management
needs to be disabled.
Use the struct xhci_plat_priv hack added by commits f7e8642d7ea0 ("usb:
host: xhci-plat: add platform data support") and 4a62bddde4d0 ("usb:
host: xhci-plat: add priv quirk for skip PHY initialization") to
propagate the setting for now.
Fixes: ff1ad80a153f ("usb: add a flag to skip PHY initialization to struct usb_hcd") Fixes: 355592d55723 ("usb: core: hcd: integrate the PHY wrapper into the HCD core") Tested-by: Matthias Kaehlcke <mka@chromium.org> Cc: stable <stable@kernel.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20220825131836.19769-1-johan+linaro@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ johan: adjust context to 5.15 ] Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Identifying and removing the stale device from the fs_uuids list is done
by btrfs_free_stale_devices(). btrfs_free_stale_devices() in turn
depends on device_path_matched() to check if the device appears in more
than one btrfs_device structure.
The matching of the device happens by its path, the device path. However,
when device mapper is in use, the dm device paths are nothing but a link
to the actual block device, which leads to the device_path_matched()
failing to match.
Fix this by matching the dev_t as provided by lookup_bdev() instead of
plain string compare of the device paths.
Reported-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's been reported that there is a possible data-race accessing to the
global card_requested[] array at ALSA sequencer core, which is used
for determining whether to call request_module() for the card or not.
This data race itself is almost harmless, as it might end up with one
extra request_module() call for the already loaded module at most.
But it's still better to fix.
This patch addresses the possible data race of card_requested[] and
client_requested[] arrays by replacing them with bitmask.
It's an atomic operation and can work without locks.
ALSA OSS sequencer refers to a global variable max_midi_devs at
creating a new port, storing it to its own field. Meanwhile this
variable may be changed by other sequencer events at
snd_seq_oss_midi_check_exit_port() in parallel, which may cause a data
race.
OTOH, this data race itself is almost harmless, as the access to the
MIDI device is done via get_mdev() and it's protected with a refcount,
hence its presence is guaranteed.
Though, it's sill better to address the data-race from the code sanity
POV, and this patch adds the proper spinlock for the protection.
Upon reception, a packet must be categorized, either it's destination is
the host, or it is another host. A packet with no destination addressing
fields may be valid in two situations:
- the packet has no source field: only ACKs are built like that, we
consider the host as the destination.
- the packet has a valid source field: it is directed to the PAN
coordinator, as for know we don't have this information we consider we
are not the PAN coordinator.
There was likely a copy/paste error made during a previous cleanup
because the if clause is now containing exactly the same condition as in
the switch case, which can never be true. In the past the destination
address was used in the switch and the source address was used in the
if, which matches what the spec says.
Cc: stable@vger.kernel.org Fixes: 5dc22bac832c ("ieee802154: use ieee802154_addr instead of *_sa variants") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/r/20220826142954.254853-1-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
My understanding is that fib_validate_source() is used to know if the src
address and the gateway address are on the same link. For that,
fib_validate_source() returns 1 (same link) or 0 (not the same network).
__mkroute_input() is the only user of these positive values, all other
callers only look if the returned value is negative.
Since the below patch, fib_validate_source() didn't return anymore 1 when
both addresses are on the same network, because the route lookup returns
RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
Let's adapat the test to return 1 again when both addresses are on the same
link.
CC: stable@vger.kernel.org Fixes: 4be009aa4a33 ("ip: fix dflt addr selection for connected nexthop") Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we are not connected to a channel, sending channel "switch"
announcement doesn't make any sense.
The BSS list is empty in that case. This causes the for loop in
cfg80211_get_bss() to be bypassed, so the function returns NULL
(check line 1424 of net/wireless/scan.c), causing the WARN_ON()
in ieee80211_ibss_csa_beacon() to get triggered (check line 500
of net/mac80211/ibss.c), which was consequently reported on the
syzkaller dashboard.
Thus, check if we have an existing connection before generating
the CSA beacon in ieee80211_ibss_finish_csa().
Both __device_attach_driver() and __driver_attach() check the return
code of the bus_type.match() function to see if the device needs to be
added to the deferred probe list. After adding the device to the list,
the logic attempts to bind the device to the driver anyway, as if the
device had matched with the driver, which is not correct.
If __device_attach_driver() detects that the device in question is not
ready to match with a driver on the bus, then it doesn't make sense for
the device to attempt to bind with the current driver or continue
attempting to match with any of the other drivers on the bus. So, update
the logic in __device_attach_driver() to reflect this.
If __driver_attach() detects that a driver tried to match with a device
that is not ready to match yet, then the driver should not attempt to bind
with the device. However, the driver can still attempt to match and bind
with other devices on the bus, as drivers can be bound to multiple
devices. So, update the logic in __driver_attach() to reflect this.
During cdrom emulation, the response to read_toc command must contain
the cdrom address as the number of sectors (2048 byte sized blocks)
represented either as an absolute value (when MSF bit is '0') or in
terms of PMin/PSec/PFrame (when MSF bit is set to '1'). Incase of
cdrom, the fsg_lun_open call sets the sector size to 2048 bytes.
When MAC OS sends a read_toc request with MSF set to '1', the
store_cdrom_address assumes that the address being provided is the
LUN size represented in 512 byte sized blocks instead of 2048. It
tries to modify the address further to convert it to 2048 byte sized
blocks and store it in MSF format. This results in data transfer
failures as the cdrom address being provided in the read_toc response
is incorrect.
This turned out not to be an error in usb-storage but rather a nested
device reset attempt. That is, as the rtl8712 driver was being
unbound from a composite device in preparation for an unrelated USB
reset (that driver does not have pre_reset or post_reset callbacks),
its ->remove routine called usb_reset_device() -- thus nesting one
reset call within another.
Performing a reset as part of disconnect processing is a questionable
practice at best. However, the bug report points out that the USB
core does not have any protection against nested resets. Adding a
reset_in_progress flag and testing it will prevent such errors in the
future.
Add proper alignment for .nospec_call_table and .nospec_return_table in
vmlinux.
[hca@linux.ibm.com]: The problem with the missing alignment of the nospec
tables exist since a long time, however only since commit de132d8600c1
("s390/alternatives: remove padding generation code") and with
CONFIG_RELOCATABLE=n the kernel may also crash at boot time.
The above named commit reduced the size of struct alt_instr by one byte,
so its new size is 11 bytes. Therefore depending on the number of cpu
alternatives the size of the __alt_instructions array maybe odd, which
again also causes that the addresses of the nospec tables will be odd.
If the address of __nospec_call_start is odd and the kernel is compiled
With CONFIG_RELOCATABLE=n the compiler may generate code that loads the
address of __nospec_call_start with a 'larl' instruction.
This will generate incorrect code since the 'larl' instruction only works
with even addresses. In result the members of the nospec tables will be
accessed with an off-by-one offset, which subsequently may lead to
addressing exceptions within __nospec_revert().
The alignment check in prepare_hugepage_range() is wrong for 2 GB
hugepages, it only checks for 1 MB hugepage alignment.
This can result in kernel crash in __unmap_hugepage_range() at the
BUG_ON(start & ~huge_page_mask(h)) alignment check, for mappings
created with MAP_FIXED at unaligned address.
Fix this by correctly handling multiple hugepage sizes, similar to the
generic version of prepare_hugepage_range().
Fixes: 08d09e313509 ("s390/mm: add support for 2GB hugepages") Cc: <stable@vger.kernel.org> # 4.8+ Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since 079d2658e5a8 ("phy: core: Warn when phy_power_on is called before
phy_init") the driver complains. In my case (Amlogic SoC) the warning
is: phy phy-fe03e000.phy.2: phy_power_on was called before phy_init
So change the order of the two calls. The same change has to be done
to the order of phy_exit() and phy_power_off().
Fix incorrect pin assignment values when connecting to a monitor with
Type-C receptacle instead of a plug.
According to specification, an UFP_D receptacle's pin assignment
should came from the UFP_D pin assignments field (bit 23:16), while
an UFP_D plug's assignments are described in the DFP_D pin assignments
(bit 15:8) during Mode Discovery.
For example the LG 27 UL850-W is a monitor with Type-C receptacle.
The monitor responds to MODE DISCOVERY command with following
DisplayPort Capability flag:
dp->alt->vdo=0x140045
The existing logic only take cares of UPF_D plug case,
and would take the bit 15:8 for this 0x140045 case.
This results in an non-existing pin assignment 0x0 in
dp_altmode_configure.
To fix this problem a new set of macros are introduced
to take plug/receptacle differences into consideration.
Fixes: 5e5a5296796c ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable@vger.kernel.org Co-developed-by: Pablo Sun <pablo.sun@mediatek.com> Co-developed-by: Macpaul Lin <macpaul.lin@mediatek.com> Reviewed-by: Guillaume Ranquet <granquet@baylibre.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Pablo Sun <pablo.sun@mediatek.com> Signed-off-by: Macpaul Lin <macpaul.lin@mediatek.com> Link: https://lore.kernel.org/r/20220804034803.19486-1-macpaul.lin@mediatek.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After xHC controller is started, either in probe or resume, it can take
a while before any of the connected usb devices are visible to the roothub
due to link training.
It's possible xhci driver loads, sees no acivity and suspends the host
before the USB device is visible.
In one testcase with a hotplugged xHC controller the host finally detected
the connected USB device and generated a wake 500ms after host initial
start.
If hosts didn't suspend the device duringe training it probablty wouldn't
take up to 500ms to detect it, but looking at specs reveal USB3 link
training has a couple long timeout values, such as 120ms
RxDetectQuietTimeout, and 360ms PollingLFPSTimeout.
So Add a 500ms grace period that keeps polling the roothub for 500ms after
start, preventing runtime suspend until USB devices are detected.
The received notification packet is held in pkg->buffer and not in pkg
itself. Fix this by using the correct buffer.
Fixes: 8c988d3f3807 ("thunderbolt: Let the connection manager handle all notifications") Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The regcache sync will set the cache_bypass = true, at that
time, when there is regmap write operation, it will bypass
the regmap cache, then the regcache sync will write back the
value from cache to register, which is not as our expectation.
Though regmap already use its internal lock to avoid such issue,
but this driver force disable the regmap internal lock in its
regmap config: disable_locking = true
To avoid this issue, use the driver's own lock to do the protect
in system PM.
The driver does not check if the cooling state passed to
gpio_fan_set_cur_state() exceeds the maximum cooling state as
stored in fan_data->num_speeds. Since the cooling state is later
used as an array index in set_fan_speed(), an array out of bounds
access can occur.
This can be exploited by setting the state of the thermal cooling device
to arbitrary values, causing for example a kernel oops when unavailable
memory is accessed this way.
The function raspberrypi_fw_get_rate (e.g. used for the recalc_rate
hook) can fail to get the clock rate from the firmware. In this case
we cannot return a signed error value, which would be casted to
unsigned long. Fix this by returning 0 instead.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://lore.kernel.org/r/20220625083643.4012-1-stefan.wahren@i2se.com Fixes: 69626d8a1213 ("clk: bcm283x: add driver interfacing with Raspberry Pi's firmware") Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In the original commit d2cb18dd015a ("clk: Add support for runtime PM"),
the commit message mentioned that pm_runtime_put_sync() would be done
at the end of clk_core_unprepare(). This mirrors the operations in
clk_core_prepare() in the opposite order.
However, the actual code that was added wasn't in the order the commit
message described. Move clk_pm_runtime_put() to the end of
clk_core_unprepare() so that it is in the correct order.
Fixes: d2cb18dd015a ("clk: Add support for runtime PM") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20220822081424.1310926-3-wenst@chromium.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit c71b9d23a158a863ff439955c234951b5f390270. Alexander
reports that it causes boot failures on i.MX8M Plus based boards
(specifically imx8mp-tqma8mpql-mba8mpxl.dts).
In the previous commits that added CLK_OPS_PARENT_ENABLE, support for
this flag was only added to rate change operations (rate setting and
reparent) and disabling unused subtree. It was not added to the
clock gate related operations. Any hardware driver that needs it for
these operations will either see bogus results, or worse, hang.
This has been seen on MT8192 and MT8195, where the imp_ii2_* clk
drivers set this, but dumping debugfs clk_summary would cause it
to hang.
Fixes: fa3cfe25c7be ("clk: core: support clocks which requires parents enable (part 2)") Fixes: 182b7ef65193 ("clk: core: support clocks which requires parents enable (part 1)") Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Link: https://lore.kernel.org/r/20220822081424.1310926-2-wenst@chromium.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
There is a spelling mistake in a gvt_vgpu_err error message. Fix it.
Fixes: 3104ff85dc8c ("drm/i915/gvt: replace the gvt_err with gvt_vgpu_err") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220315202449.2952845-1-colin.i.king@gmail.com Reviewed-by: Zhi Wang <zhi.a.wang@intel.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The Qualcomm dwc3 runtime-PM implementation checks the xhci
platform-device pointer in the wakeup-interrupt handler to determine
whether the controller is in host mode and if so triggers a resume.
After a role switch in OTG mode the xhci platform-device would have been
freed and the next wakeup from runtime suspend would access the freed
memory.
Note that role switching is executed from a freezable workqueue, which
guarantees that the pointer is stable during suspend.
Also note that runtime PM has been broken since commit 5b1fe5eb5e5c
("usb: dwc3: qcom: Honor wakeup enabled/disabled state"), which
incidentally also prevents this issue from being triggered.
A transaction of type BINDER_TYPE_WEAK_HANDLE can fail to increment the
reference for a node. In this case, the target proc normally releases
the failed reference upon close as expected. However, if the target is
dying in parallel the call will race with binder_deferred_release(), so
the target could have released all of its references by now leaving the
cleanup of the new failed reference unhandled.
The transaction then ends and the target proc gets released making the
ref->proc now a dangling pointer. Later on, ref->node is closed and we
attempt to take spin_lock(&ref->proc->inner_lock), which leads to the
use-after-free bug reported below. Let's fix this by cleaning up the
failed reference on the spot instead of relying on the target to do so.
==================================================================
BUG: KASAN: use-after-free in _raw_spin_lock+0xa8/0x150
Write of size 4 at addr ffff5ca207094238 by task kworker/1:0/590
The probe session-duplication overflow check incremented the session
count also when there were no more available sessions so that memory
beyond the fixed-size slab-allocated session array could be corrupted in
fastrpc_session_alloc() on open().
Add the missing sanity check on the probed-session count to avoid
corrupting memory beyond the fixed-size slab-allocated session array
when there are more than FASTRPC_MAX_SESSIONS sessions defined in the
devicetree.
syzbot is reporting hung task at __input_unregister_device() [1], for
iforce_close() waiting at wait_event_interruptible() with dev->mutex held
is blocking input_disconnect_device() from __input_unregister_device().
It seems that the cause is simply that commit 4de00304323c8661 ("Input:
iforce - wait for command completion when closing the device") forgot to
call wake_up() after clear_bit().
Fix this problem by introducing a helper that calls clear_bit() followed
by wake_up_all().
Reported-by: syzbot <syzbot+deb6abc36aad4008f407@syzkaller.appspotmail.com> Fixes: 4de00304323c8661 ("Input: iforce - wait for command completion when closing the device") Tested-by: syzbot <syzbot+deb6abc36aad4008f407@syzkaller.appspotmail.com> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Co-developed-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Link: https://lore.kernel.org/r/887021c3-4f13-40ce-c8b9-aa6e09faa3a7@I-love.SAKURA.ne.jp Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the user initializes the uart port, and waits for the transmit
engine to complete in lpuart32_set_termios(), if the UART TX fifo has
dirty data and the UARTMODIR enable the flow control, the TX fifo may
never be empty. So here we should disable the flow control first to make
sure the transmit engin can complete.
When changing the console font with ioctl(KDFONTOP) the new font size
can be bigger than the previous font. A previous selection may thus now
be outside of the new screen size and thus trigger out-of-bounds
accesses to graphics memory if the selection is removed in
vc_do_resize().
Prevent such out-of-memory accesses by dropping the selection before the
various con_font_set() console handlers are called.
Christophe Leroy reported that commit 202351226f7b ("kbuild: link
symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") broke
mpc85xx_defconfig + CONFIG_RELOCATABLE=y.
The compiler emits a bunch of R_PPC_UADDR32, which is not supported by
arch/powerpc/kernel/reloc_32.S.
The reason is there exists an unaligned symbol.
$ powerpc-linux-gnu-nm -n vmlinux
... c0b31258 d spe_aligninfo c0b31298 d __func__.0 c0b312a9 D sys_call_table c0b319b8 d __func__.0
Commit 202351226f7b is not the root cause. Even before that, I can
reproduce the same issue for mpc85xx_defconfig + CONFIG_RELOCATABLE=y
+ CONFIG_MODVERSIONS=n.
It is just that nobody noticed because when CONFIG_MODVERSIONS is
enabled, a __crc_* symbol inserted before sys_call_table was hiding the
unalignment issue.
Adding alignment to the syscall table for ppc32 fixes the issue.
_Read/Write_MACREG callbacks are NULL so the read/write_macreg_hdl()
functions don't do anything except free the "pcmd" pointer. It
results in a use after free. Delete them.
Fixes: 91786735aed5 ("staging: r8712u: Add the new driver to the mainline kernel") Cc: stable <stable@kernel.org> Reported-by: Zheng Wang <hackerzheng666@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/Yw4ASqkYcUhUfoY2@kili Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The setting of RS485 RTS polarity is inverse in the current driver.
When the property of 'rs485-rts-active-low' is enabled in the dts node,
the RTS signal should be LOW during sending. Otherwise, if there is no
such a property, the RTS should be HIGH during sending.
Fixes: 108c7ba32a62 ("tty: serial: fsl_lpuart: Add support for RS-485") Cc: stable <stable@kernel.org> Signed-off-by: Nicolas Diaz <nicolas.diaz@nxp.com> Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Link: https://lore.kernel.org/r/20220805144529.604856-1-shenwei.wang@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For passive connections, the refcount increment has been done in
smc_clcsock_accept()-->smc_sock_alloc().
Fixes: 77629c0b5b0d ("net/smc: restructure client and server code in af_smc") Signed-off-by: Yacan Liu <liuyacan@corp.netease.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the GSO splitting feature of sch_cake is enabled, GSO superpackets
will be broken up and the resulting segments enqueued in place of the
original skb. In this case, CAKE calls consume_skb() on the original skb,
but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
assuming the original skb still exists, when it really has been freed. Fix
this by adding the __NET_XMIT_STOLEN flag to the return value in this case.
strp_init() is called just a few lines above this csk->sk_user_data
check, it also initializes strp->work etc., therefore, it is
unnecessary to call strp_done() to cancel the freshly initialized
work.
And if sk_user_data is already used by KCM, psock->strp should not be
touched, particularly strp->work state, so we need to move strp_init()
after the csk->sk_user_data check.
This also makes a lockdep warning reported by syzbot go away.
Reported-and-tested-by: syzbot+9fc084a4348493ef65d2@syzkaller.appspotmail.com Reported-by: syzbot+e696806ef96cdd2d87cd@syzkaller.appspotmail.com Fixes: 200748222a37 ("kcm: Check if sk_user_data already set in kcm_attach") Fixes: ab80e5255f9e ("kcm: Call strp_stop before strp_done in kcm_attach") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Link: https://lore.kernel.org/r/20220827181314.193710-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The function neigh_timer_handler() is a timer handler that runs in an
atomic context. When used by rocker, neigh_timer_handler() calls
"kzalloc(.., GFP_KERNEL)" that may sleep. As a result, the sleep in
atomic context bug will happen. One of the processes is shown below:
This patch changes the gfp_t parameter of kzalloc() from GFP_KERNEL to
GFP_ATOMIC in order to mitigate the bug.
Fixes: 191c0c7c9731 ("rocker: Change world_ops API and implementation to be switchdev independant") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The issue is the same to commit a15da784f8b0 ("net: sched: multiq: don't
call qdisc_put() while holding tree lock"). Qdiscs call qdisc_put() while
holding sch tree spinlock, which results sleeping-while-atomic BUG.
Fixes: d30d424a9fdb ("net: sched: protect block state with mutex") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220826013930.340121-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Turning off port power in shutdown did cause issues such as a laptop not
proprly powering off, and some specific usb devies failing to enumerate the
subsequent boot after a warm reset.
So revert this.
Fixes: a505231d57b4 ("xhci: turn off port power in shutdown") Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20220825150840.132216-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The return type is supposed to be ssize_t, which is signed long,
but "r" was declared as unsigned int. This means that on 64 bit systems
we return positive values instead of negative error codes.
The root cause for this race is that the upper layer (ieee802154) is
unaware of this detaching event and the function adf7242_channel can
be called without any checks.
To fix this, we can add a flag write at the beginning of adf7242_remove
and add flag check in adf7242_channel. Or we can just defer the
destructive operation like other commit 7c1da104def9 ("hamradio: defer
ax25 kfree after unregister_netdev") which let the
ieee802154_unregister_hw() to handle the synchronization. This patch
takes the second option.
Fixes: abbb7436848e ("net: ieee802154: adf7242: Fix OCL calibration
runs") Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Michael Hennerich <michael.hennerich@analog.com> Link: https://lore.kernel.org/r/20220808034224.12642-1-linma@zju.edu.cn Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On Intel hardware the SLP_TYPx bitfield occupies bits 10-12 as per ACPI
specification (see Table 4.13 "PM1 Control Registers Fixed Hardware
Feature Control Bits" for the details).
Fix the mask and other related definitions accordingly.
Fixes: b699f9a5fe79 ("x86/platform: New Intel Atom SOC power management controller driver") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220801113734.36131-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When unplugging an Ethernet cable, false carrier events were produced by
the PHY at a very high rate. Once the false carrier counter full, an
interrupt was triggered every few clock cycles until the cable was
replugged. This resulted in approximately 10k/s interrupts.
Since the false carrier counter (FCSCR) is never used, we can safely
disable this interrupt.
In addition to improving performance, this also solved MDIO read
timeouts I was randomly encountering with an i.MX8 fec MAC because of
the interrupt flood. The interrupt count and MDIO timeout fix were
tested on a v5.4.110 kernel.
Fixes: 60ce3f0822f5 ("net: phy: DP83822 initial driver submission") Signed-off-by: Enguerrand de Ribaucourt <enguerrand.de-ribaucourt@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 89929dc2956d ("mm: kmemleak: take a full lowmem check in
kmemleak_*_phys()") brought false leak alarms on some archs like arm64
that does not init pfn boundary in early booting. The final solution
lands on linux-6.0: commit ae2a1f131a41 ("mm: kmemleak: add rbtree and
store physical address for objects allocated with PA").
Revert this commit before linux-6.0. The original issue of invalid PA
can be mitigated by additional check in devicetree.
Commit a8d654978b92 ("add barriers to buffer_uptodate and
set_buffer_uptodate") added proper memory barriers to the buffer head
BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date
will be guaranteed to actually see initialized state.
However, that commit didn't _just_ add the memory barrier, it also ended
up dropping the "was it already set" logic that the BUFFER_FNS() macro
had.
That's conceptually the right thing for a generic "this is a memory
barrier" operation, but in the case of the buffer contents, we really
only care about the memory barrier for the _first_ time we set the bit,
in that the only memory ordering protection we need is to avoid anybody
seeing uninitialized memory contents.
Any other access ordering wouldn't be about the BH_Uptodate bit anyway,
and would require some other proper lock (typically BH_Lock or the folio
lock). A reader that races with somebody invalidating the buffer head
isn't an issue wrt the memory ordering, it's a serialization issue.
Now, you'd think that the buffer head operations don't matter in this
day and age (and I certainly thought so), but apparently some loads
still end up being heavy users of buffer heads. In particular, the
kernel test robot reported that not having this bit access optimization
in place caused a noticeable direct IO performance regression on ext4:
When calling debugfs_lookup() the result must have dput() called on it,
otherwise the memory will leak over time. Fix this up to be much
simpler logic and only create the root debugfs directory once when the
driver is first accessed. That resolves the memory leak and makes
things more obvious as to what the intent is.
Cc: Marcin Wojtas <mw@semihalf.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Cc: stable <stable@kernel.org> Fixes: 39d7629f2ab8 ("net: mvpp2: add a debugfs interface for the Header Parser") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit c11760c06374ae3fc740a9eccbd4711ec0c93d4e as
it can cause invalid link quality command sent to the firmware
and address the off-by-one issue by fixing condition of while loop.
Cc: stable@vger.kernel.org Fixes: c11760c06374 ("wifi: iwlegacy: 4965: fix potential off-by-one overflow in il4965_rs_fill_link_cmd()") Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20220815073737.GA999388@wp.pl Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A race condition may occur if the user calls close() on another thread
during a write() operation on the device node of the efi capsule.
This is a race condition that occurs between the efi_capsule_write() and
efi_capsule_flush() functions of efi_capsule_fops, which ultimately
results in UAF.
So, the page freeing process is modified to be done in
efi_capsule_release() instead of efi_capsule_flush().
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled. So add all skb to
a tmp list, then free them after spin_unlock_irqrestore() at
once.
Fixes: 54a94a72973a ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
User can use AF_PACKET socket to send packets with the length of 0.
When min_header_len equals to 0, packet_snd will call __dev_queue_xmit
to send packets, and sock->type can be any type.
Reported-by: syzbot+5ea725c25d06fb9114c4@syzkaller.appspotmail.com Fixes: 214de3de5b08 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pavel Begunkov [Thu, 1 Sep 2022 15:19:15 +0000 (16:19 +0100)]
io_uring: disable polling pollfree files
Older kernels lack io_uring POLLFREE handling. As only affected files
are signalfd and android binder the safest option would be to disable
polling those files via io_uring and hope there are no users.
Fixes: 0edea71e23c02 ("io_uring: add support for IORING_OP_POLL") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The assumption in __disable_kprobe() is wrong, and it could try to disarm
an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can
easily reproduce this issue.
1. Write 0 to /sys/kernel/debug/kprobes/enabled.
# echo 0 > /sys/kernel/debug/kprobes/enabled
2. Run execsnoop. At this time, one kprobe is disabled.
# /usr/share/bcc/tools/execsnoop &
[1] 2460
PCOMM PID PPID RET ARGS
# cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]
3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
kprobes_all_disarmed to false but does not arm the disabled kprobe.
# echo 1 > /sys/kernel/debug/kprobes/enabled
# cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]
4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().
# fg
/usr/share/bcc/tools/execsnoop
^C
Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
some cleanups and leaves the aggregated kprobe in the hash table. Then,
__unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
infinite loop like this.
We're seeing a weird problem in production where we have overlapping
extent items in the extent tree. It's unclear where these are coming
from, and in debugging we realized there's no check in the tree checker
for this sort of problem. Add a check to the tree-checker to make sure
that the extents do not overlap each other.
Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
Some pixel clock values could cause HDMI TMDS SSCPs to be misaligned
between different HDMI lanes when using YCbCr420 10-bit pixel format.
BIOS functions for transmitter/encoder control take pixel clock in kHz
increments, whereas the function for setting the pixel clock is in 100Hz
increments. Setting pixel clock to a value that is not on a kHz boundary
will cause the issue.
[How]
Round pixel clock down to nearest kHz in 10/12-bpc cases.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Brian Chang <Brian.Chang@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Normal processing of ARP request (usually this is Ethernet broadcast
packet) coming to the host is looking like the following:
* the packet comes to arp_process() call and is passed through routing
procedure
* the request is put into the queue using pneigh_enqueue() if
corresponding ARP record is not local (common case for container
records on the host)
* the request is processed by timer (within 80 jiffies by default) and
ARP reply is sent from the same arp_process() using
NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
pneigh_enqueue())
And here the problem comes. Linux kernel calls pneigh_queue_purge()
which destroys the whole queue of ARP requests on ANY network interface
start/stop event through __neigh_ifdown().
This is actually not a problem within the original world as network
interface start/stop was accessible to the host 'root' only, which
could do more destructive things. But the world is changed and there
are Linux containers available. Here container 'root' has an access
to this API and could be considered as untrusted user in the hosting
(container's) world.
Thus there is an attack vector to other containers on node when
container's root will endlessly start/stop interfaces. We have observed
similar situation on a real production node when docker container was
doing such activity and thus other containers on the node become not
accessible.
The patch proposed doing very simple thing. It drops only packets from
the same namespace in the pneigh_queue_purge() where network interface
state change is detected. This is enough to prevent the problem for the
whole node preserving original semantics of the code.
v2:
- do del_timer_sync() if queue is empty after pneigh_queue_purge()
v3:
- rebase to net tree
Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[Why]
After ODM clock off, optc underflow bit will be kept there always and clear not work.
We need to clear that before clock off.
[How]
Clear that if have when clock off.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Fudong Wang <Fudong.Wang@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[why]
In some cases MPC tree bottom pipe ends up point to itself. This causes
iterating from top to bottom to hang the system in an infinite loop.
[how]
When looping to next MPC bottom pipe, check that the pointer is not same
as current to avoid infinite loop.
Reviewed-by: Josip Pavic <Josip.Pavic@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
btrfs_lookup_dir_index_item() and btrfs_lookup_dir_item() lookup for dir
entries and both are used during log replay or when updating a log tree
during an unlink.
However when the dir item does not exists, btrfs_lookup_dir_item() returns
NULL while btrfs_lookup_dir_index_item() returns PTR_ERR(-ENOENT), and if
the dir item exists but there is no matching entry for a given name or
index, both return NULL. This makes the call sites during log replay to
be more verbose than necessary and it makes it easy to miss this slight
difference. Since we don't need to distinguish between those two cases,
make btrfs_lookup_dir_index_item() always return NULL when there is no
matching directory entry - either because there isn't any dir entry or
because there is one but it does not match the given name and index.
Also rename the argument 'objectid' of btrfs_lookup_dir_index_item() to
'index' since it is supposed to match an index number, and the name
'objectid' is not very good because it can easily be confused with an
inode number (like the inode number a dir entry points to).
CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
During renames we pin the logs of the roots a bit too early, before the
calls to btrfs_insert_inode_ref(). We can pin the logs after those calls,
since those will not change anything in a log tree.
In a scenario where we have multiple and diverse filesystem operations
running in parallel, those calls can take a significant amount of time,
due to lock contention on extent buffers, and delay log commits from other
tasks for longer than necessary.
So just pin logs after calls to btrfs_insert_inode_ref() and right before
the first operation that can update a log tree.
The following script that uses dbench was used for testing:
$ cat dbench-test.sh
#!/bin/bash
DEV=/dev/nvme0n1
MNT=/mnt/nvme0n1
MOUNT_OPTIONS="-o ssd"
MKFS_OPTIONS="-m single -d single"
echo "performance" | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
btrfs_search_slot is called in multiple places in dir-item.c to search
for a dir entry, and then calling btrfs_match_dir_name to return a
btrfs_dir_item.
In order to reduce the number of callers of btrfs_search_slot, create a
common function that looks for the dir key, and if found call
btrfs_match_dir_item_name.
Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anon_vma->degree tracks the combined number of child anon_vmas and VMAs
that use the anon_vma as their ->anon_vma.
anon_vma_clone() then assumes that for any anon_vma attached to
src->anon_vma_chain other than src->anon_vma, it is impossible for it to
be a leaf node of the VMA tree, meaning that for such VMAs ->degree is
elevated by 1 because of a child anon_vma, meaning that if ->degree
equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.
This assumption is wrong because the ->degree optimization leads to leaf
nodes being abandoned on anon_vma_clone() - an existing anon_vma is
reused and no new parent-child relationship is created. So it is
possible to reuse an anon_vma for one VMA while it is still tied to
another VMA.
This is an issue because is_mergeable_anon_vma() and its callers assume
that if two VMAs have the same ->anon_vma, the list of anon_vmas
attached to the VMAs is guaranteed to be the same. When this assumption
is violated, vma_merge() can merge pages into a VMA that is not attached
to the corresponding anon_vma, leading to dangling page->mapping
pointers that will be dereferenced during rmap walks.
Fix it by separately tracking the number of child anon_vmas and the
number of VMAs using the anon_vma as their ->anon_vma.
Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.
ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:
register_ftrace_function
ftrace_startup
__register_ftrace_function
...
add_ftrace_ops(&ftrace_ops_list, ops)
...
...
ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
...
return 0 // ops is in the ftrace_ops_list.
When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
ftrace_shutdown
if (unlikely(ftrace_disabled))
return -ENODEV; // return here, __unregister_ftrace_function is not executed,
// as a result, ops is still in the ftrace_ops_list
__unregister_ftrace_function
...
If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:
is_ftrace_trampoline
ftrace_ops_trampoline
do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
In `do_fb_ioctl()` of fbmem.c, if cmd is FBIOPUT_VSCREENINFO, var will be
copied from user, then go through `fb_set_var()` and
`info->fbops->fb_check_var()` which could may be `pm2fb_check_var()`.
Along the path, `var->pixclock` won't be modified. This function checks
whether reciprocal of `var->pixclock` is too high. If `var->pixclock` is
zero, there will be a divide by zero error. So, it is necessary to check
whether denominator is zero to avoid crash. As this bug is found by
Syzkaller, logs are listed below.
The error handling code in pvr2_hdw_create forgets to unregister the
v4l2 device. When pvr2_hdw_create returns back to pvr2_context_create,
it calls pvr2_context_destroy to destroy context, but mp->hdw is NULL,
which leads to that pvr2_hdw_destroy directly returns.
Fix this by adding v4l2_device_unregister to decrease the refcount of
usb interface.
Reported-by: syzbot+77b432d57c4791183ed4@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the DMA mask is not set explicitly, the following warning occurs
when the userspace tries to access the dma-buf via the CPU as
reported by syzbot here:
It is possible for a malicious device to forgo submitting a Feature
Report. The HID Steam driver presently makes no prevision for this
and de-references the 'struct hid_report' pointer obtained from the
HID devices without first checking its validity. Let's change that.
Cc: Jiri Kosina <jikos@kernel.org> Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com> Cc: linux-input@vger.kernel.org Fixes: dbd9ea6f132d9 ("HID: add driver for Valve Steam Controller") Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When building an external module, if users don't need to separate the
compilation output and source code, they run the following command:
"make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)"
and "$(src)" are the same.
If they need to separate them, they run "make -C $(KERNEL_SRC_DIR)
O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the
command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to
prevent compilation failure.
So the kernel should change the included path to avoid the copy operation.
Signed-off-by: Jing Leng <jleng@ambarella.com>
[masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way,
but this patch is a nice clean up anyway.] Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
[nsc: updated context for v4.19] Signed-off-by: Nicolas Schier <n.schier@avm.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Older Intel CPUs that are not in the affected processor list for MMIO
Stale Data vulnerabilities currently report "Not affected" in sysfs,
which may not be correct. Vulnerability status for these older CPUs is
unknown.
Add known-not-affected CPUs to the whitelist. Report "unknown"
mitigation status for CPUs that are not in blacklist, whitelist and also
don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware
immunity to MMIO Stale Data vulnerabilities.
Mitigation is not deployed when the status is unknown.
[ bp: Massage, fixup. ]
Fixes: b8e6f6a4a110 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For non-protection pXd_none() page faults in do_dat_exception(), we
call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC).
In do_exception(), vma->vm_flags is checked against that before
calling handle_mm_fault().
Since commit c1533bde21029 ("[S390] store indication fault optimization"),
we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that
it was a write access. However, the vma flags check is still only
checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also
calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma
does not allow VM_WRITE.
Fix this by changing access check in do_exception() to VM_WRITE only,
when recognizing write access.
Some drivers rely on having all VMAs through which a PFN might be
accessible listed in the rmap for correctness.
However, on X86, it was possible for a VMA with stale TLB entries
to not be listed in the rmap.
This was fixed in mainline with
commit ab18087539e8 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"),
but that commit relies on preceding refactoring in
commit 35167b29df348 ("mmu_gather: Let there be one tlb_{start,end}_vma()
implementation") and commit a31d3b9463867 ("mmu_gather: Remove per arch
tlb_{start,end}_vma()").
This patch provides equivalent protection without needing that
refactoring, by forcing a TLB flush between removing PTEs in
unmap_vmas() and the call to unlink_file_vma() in free_pgtables().
[This is a stable-specific rewrite of the upstream commit!] Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it
doesn't need to make forward progress under memory pressure. Marking this
workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a
non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following
warning:
Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com Fixes: f7c8e77b9a4a ("scsi: storvsc: Allow only one remove lun work item to be issued per lun") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The problem was introduced by commit: d0bb442c6848 ("perf/x86/intel/uncore: Clean up client IMC")
Where the read_counter callback was replace to point to the generic
uncore_mmio_read_counter() function.
The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
MMIO therefore reading 64-bit from MMIO. Although this is okay for the
uncore_perf_event_update() function because it is shifting the value based
on the actual counter width to compute a delta, it is not okay for the
uncore_pmu_event_start() which is simply reading the counter and therefore
priming the event->prev_count with a bogus value which is responsible for
causing bogus deltas in the perf stat command above.
The fix is to reintroduce the custom callback for read_counter for the SNB
IMC PMU and use readl() instead of readq(). With the change the output of
perf stat is back to normal:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000120987 296.94 MiB uncore_imc/data_reads/
1.000120987 138.42 MiB uncore_imc/data_writes/
2.000403144 175.91 MiB uncore_imc/data_reads/
2.000403144 68.50 MiB uncore_imc/data_writes/
Fixes: d0bb442c6848 ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.
Reproducers part of the patches.
I propose to backport both fixes to stable trees. The first fix needs a
small adjustment.
This patch (of 2):
Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.
Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)
And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.
Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: ed2b8b4c23df ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Peter Feiner <pfeiner@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Jamie Liu <jamieliu@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The freq Qos request would be removed repeatedly if the cpufreq policy
relates to more than one CPU. Then, it would cause the "called for unknown
object" warning.
Remove the freq Qos request for each CPU relates to the cpufreq policy,
instead of removing repeatedly for the last CPU of it.
Fixes: 340ca74d8cf3 ("ACPI: processor: Add QoS requests for all CPUs") Reported-by: Jeremy Linton <Jeremy.Linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Riwen Lu <luriwen@kylinos.cn> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pointers for guarded storage and runtime instrumentation control
blocks are stored in the thread_struct of the associated task. These
pointers are initially copied on fork() via arch_dup_task_struct()
and then cleared via copy_thread() before fork() returns. If fork()
happens to fail after the initial task dup and before copy_thread(),
the newly allocated task and associated thread_struct memory are
freed via free_task() -> arch_release_task_struct(). This results in
a double free of the guarded storage and runtime info structs
because the fields in the failed task still refer to memory
associated with the source task.
This problem can manifest as a BUG_ON() in set_freepointer() (with
CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled)
when running trinity syscall fuzz tests on s390x. To avoid this
problem, clear the associated pointer fields in
arch_dup_task_struct() immediately after the new task is copied.
Note that the RI flag is still cleared in copy_thread() because it
resides in thread stack memory and that is where stack info is
copied.