The check for spurious IRQs introduced in 18142dcb876a0 assumed that the bits
in the control and status registers are aligned. This is true for the H7 and MP1
version, but not the F4. The interrupt was then never handled on the F4.
Instead of increasing the complexity of the comparison and check each bit specifically,
we remove this check completely and rely on the generic handler for spurious IRQs.
Fixes: 18142dcb876a ("iio: adc: stm32-adc: fix a regression when using dma and irq") Signed-off-by: Yannick Brosseau <yannick.brosseau@gmail.com> Reviewed-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Link: https://lore.kernel.org/r/20220516203939.3498673-3-yannick.brosseau@gmail.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The irq handler was only checking the mask for the first ADCs in the case of the
F4 and H7 generation, since it was iterating up to the num_irq value. This patch add
the maximum number of ADC in the common register, which map to the number of entries of
eoc_msk and ovr_msk in stm32_adc_common_regs. This allow the handler to check all ADCs in
that module.
Tested on a STM32F429NIH6.
Fixes: 18142dcb876a ("iio: adc: stm32-adc: fix a regression when using dma and irq") Signed-off-by: Yannick Brosseau <yannick.brosseau@gmail.com> Reviewed-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com> Link: https://lore.kernel.org/r/20220516203939.3498673-2-yannick.brosseau@gmail.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When introducing support for processed channels I needed
to invert the expression:
if (!iio_channel_has_info(schan, IIO_CHAN_INFO_RAW) ||
!iio_channel_has_info(schan, IIO_CHAN_INFO_SCALE))
dev_err(dev, "source channel does not support raw/scale\n");
To the inverse, meaning detect when we can usse raw+scale
rather than when we can not. This was the result:
Ooops. Spot the error. Yep old George Boole came up and bit me.
That should be an &&.
The current code "mostly works" because we have not run into
systems supporting only raw but not scale or only scale but not
raw, and I doubt there are few using the rescaler on anything
such, but let's fix the logic.
Ensure that the irq_work has completed before the trigger is freed.
==================================================================
BUG: KASAN: use-after-free in irq_work_run_list
Read of size 8 at addr 0000000064702248 by task python3/25
On fxls8471, after set the reset bit, the device will reset immediately,
will not give ACK. So ignore the return value of this reset operation,
let the following code logic to check whether the reset operation works.
Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Fixes: e0c1f7544323 ("iio: mma8452: Initialise before activating") Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/1655292718-14287-1-git-send-email-haibo.chen@nxp.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IIO trigger interface function iio_trigger_get() should be called after
iio_trigger_register() (or its devm analogue) strictly, because of
iio_trigger_get() acquires module refcnt based on the trigger->owner
pointer, which is initialized inside iio_trigger_register() to
THIS_MODULE.
If this call order is wrong, the next iio_trigger_put() (from sysfs
callback or "delete module" path) will dereference "default" module
refcnt, which is incorrect behaviour.
Fixes: 9610f70b81a2 ("iio: mxc4005: add data ready trigger for mxc4005") Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220524181150.9240-4-ddrokosov@sberdevices.ru Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IIO trigger interface function iio_trigger_get() should be called after
iio_trigger_register() (or its devm analogue) strictly, because of
iio_trigger_get() acquires module refcnt based on the trigger->owner
pointer, which is initialized inside iio_trigger_register() to
THIS_MODULE.
If this call order is wrong, the next iio_trigger_put() (from sysfs
callback or "delete module" path) will dereference "default" module
refcnt, which is incorrect behaviour.
IIO trigger interface function iio_trigger_get() should be called after
iio_trigger_register() (or its devm analogue) strictly, because of
iio_trigger_get() acquires module refcnt based on the trigger->owner
pointer, which is initialized inside iio_trigger_register() to
THIS_MODULE.
If this call order is wrong, the next iio_trigger_put() (from sysfs
callback or "delete module" path) will dereference "default" module
refcnt, which is incorrect behaviour.
Fixes: a14b8a308800 ("iio: accel: kxcjk-1013: Increment ref counter for indio_dev->trig") Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220524181150.9240-3-ddrokosov@sberdevices.ru Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IIO trigger interface function iio_trigger_get() should be called after
iio_trigger_register() (or its devm analogue) strictly, because of
iio_trigger_get() acquires module refcnt based on the trigger->owner
pointer, which is initialized inside iio_trigger_register() to
THIS_MODULE.
If this call order is wrong, the next iio_trigger_put() (from sysfs
callback or "delete module" path) will dereference "default" module
refcnt, which is incorrect behaviour.
Fixes: 524d1fd74343 ("iio: chemical: ccs811: Add support for data ready trigger") Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220524181150.9240-5-ddrokosov@sberdevices.ru Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
IIO trigger interface function iio_trigger_get() should be called after
iio_trigger_register() (or its devm analogue) strictly, because of
iio_trigger_get() acquires module refcnt based on the trigger->owner
pointer, which is initialized inside iio_trigger_register() to
THIS_MODULE.
If this call order is wrong, the next iio_trigger_put() (from sysfs
callback or "delete module" path) will dereference "default" module
refcnt, which is incorrect behaviour.
Fixes: b965f0a498d6 ("iio: humidity: add support to hts221 rh/temp combo device") Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Link: https://lore.kernel.org/r/20220524181150.9240-6-ddrokosov@sberdevices.ru Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are hitting the following deadlock in production occasionally
Task 1 Task 2 Task 3 Task 4 Task 5
fsync(A)
start trans
start commit
falloc(A)
lock 5m-10m
start trans
wait for commit
fiemap(A)
lock 0-10m
wait for 5m-10m
(have 0-5m locked)
have btrfs_need_log_full_commit
!full_sync
wait_ordered_extents
finish_ordered_io(A)
lock 0-5m
DEADLOCK
We have an existing dependency of file extent lock -> transaction.
However in fsync if we tried to do the fast logging, but then had to
fall back to committing the transaction, we will be forced to call
btrfs_wait_ordered_range() to make sure all of our extents are updated.
This creates a dependency of transaction -> file extent lock, because
btrfs_finish_ordered_io() will need to take the file extent lock in
order to run the ordered extents.
Fix this by stopping the transaction if we have to do the full commit
and we attempted to do the fast logging. Then attach to the transaction
and commit it if we need to.
CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In 731044a98001 "btrfs: switch extent buffer tree lock to rw_semaphore"
the functions for tree read locking were rewritten, and in the process
the read lock functions started setting eb->lock_owner = current->pid.
Previously lock_owner was only set in tree write lock functions.
Read locks are shared, so they don't have exclusive ownership of the
underlying object, so setting lock_owner to any single value for a
read lock makes no sense. It's mostly harmless because write locks
and read locks are mutually exclusive, and none of the existing code
in btrfs (btrfs_init_new_buffer and print_eb_refs_lock) cares what
nonsense is written in lock_owner when no writer is holding the lock.
KCSAN does care, and will complain about the data race incessantly.
Remove the assignments in the read lock functions because they're
useless noise.
Fixes: 731044a98001 ("btrfs: switch extent buffer tree lock to rw_semaphore") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arch/arm/boot/dts/r8a77470-iwg23s-sbc.dtb: usb@ee080100: phys: [[17, 0], [31]] is too long
From schema: Documentation/devicetree/bindings/usb/generic-ehci.yaml
arch/arm/boot/dts/r8a77470-iwg23s-sbc.dtb: usb@ee0c0100: phys: [[17, 1], [33], [21, 0]] is too long
From schema: Documentation/devicetree/bindings/usb/generic-ehci.yaml
Some USB EHCI controllers (e.g. on the Renesas RZ/G1C SoC) have multiple
PHYs. Increase the maximum number of PHYs to 3, which is sufficient for
now.
arch/arm/boot/dts/r8a77470-iwg23s-sbc.dtb: usb@ee080000: phys: [[17, 0], [31]] is too long
From schema: Documentation/devicetree/bindings/usb/generic-ohci.yaml
arch/arm/boot/dts/r8a77470-iwg23s-sbc.dtb: usb@ee0c0000: phys: [[17, 1], [33], [21, 0]] is too long
From schema: Documentation/devicetree/bindings/usb/generic-ohci.yaml
Some USB OHCI controllers (e.g. on the Renesas RZ/G1C SoC) have multiple
PHYs. Increase the maximum number of PHYs to 3, which is sufficient for
now.
The complete() function may be called even though request is not
completed. In this case, it's necessary to check request status so
as not to set device address wrongly.
Fixes: 1c3890f3f0e7 ("usb: chipidea: udc: update gadget states according to ch9")
cc: <stable@vger.kernel.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Link: https://lore.kernel.org/r/20220623030242.41796-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Re-reading a recently merged fix to the raw_gadget driver showed that
it inadvertently introduced a double-free bug in a failure pathway.
If raw_ioctl_init() encounters an error after the driver ID number has
been allocated, it deallocates the ID number before returning. But
when dev_free() runs later on, it will then try to deallocate the ID
number a second time.
Closely related to this issue is another error in the recent fix: The
ID number is stored in the raw_dev structure before the code checks to
see whether the structure has already been initialized, in which case
the new ID number would overwrite the earlier value.
The solution to both bugs is to keep the new ID number in a local
variable, and store it in the raw_dev structure only after the check
for prior initialization. No errors can occur after that point, so
the double-free will never happen.
In a report for a separate bug (which has already been fixed by commit 4013808fc2d8 "usb: gadget: fix race when gadget driver register via
ioctl") in the raw-gadget driver, the syzbot console log included
error messages caused by attempted registration of a new driver with
the same name as an existing driver:
> kobject_add_internal failed for raw-gadget with -EEXIST, don't try to register things with the same name in the same directory.
> UDC core: USB Raw Gadget: driver registration failed: -17
> misc raw-gadget: fail, usb_gadget_register_driver returned -17
These errors arise because raw_gadget.c registers a separate UDC
driver for each of the UDC instances it creates, but these drivers all
have the same name: "raw-gadget". Until recently this wasn't a
problem, but when the "gadget" bus was added and UDC drivers were
registered on this bus, it became possible for name conflicts to cause
the registrations to fail. The reason is simply that the bus code in
the driver core uses the driver name as a sysfs directory name (e.g.,
/sys/bus/gadget/drivers/raw-gadget/), and you can't create two
directories with the same pathname.
To fix this problem, the driver names used by raw-gadget are made
distinct by appending a unique ID number: "raw-gadget.N", with a
different value of N for each driver instance. And to avoid the
proliferation of error handling code in the raw_ioctl_init() routine,
the error return paths are refactored into the common pattern (goto
statements leading to cleanup code at the end of the routine).
Link: https://lore.kernel.org/all/0000000000008c664105dffae2eb@google.com/ Fixes: 49fdfe0d7d24 "USB: gadget: Add a new bus for gadgets" CC: Andrey Konovalov <andreyknvl@gmail.com> CC: <stable@vger.kernel.org> Reported-and-tested-by: syzbot+02b16343704b3af1667e@syzkaller.appspotmail.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Acked-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/YqdG32w+3h8c1s7z@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Meteor Lake TCSS(Type-C Subsystem) xHCI needs to be runtime suspended
whenever possible to allow the TCSS hardware block to enter D3cold and
thus save energy.
In the same way as Intel Alder Lake TCSS (Type-C Subsystem) the Raptor
Lake TCSS xHCI needs to be runtime suspended whenever possible to
allow the TCSS hardware block to enter D3cold and thus save energy.
If ports are not turned off in shutdown then runtime suspended
self-powered USB devices may survive in U3 link state over S5.
During subsequent boot, if firmware sends an IPC command to program
the port in DISCONNECT state, it will time out, causing significant
delay in the boot time.
Turning off roothub port power is also recommended in xhci
specification 4.19.4 "Port Power" in the additional note.
Intel SoC PMIC is a generic name for all PMICs that are used
on Intel platforms. In particular, INTEL_SOC_PMIC kernel configuration
option refers to Crystal Cove PMIC, which has never been a part
of any Intel Broxton hardware. Drop wrong dependency from Kconfig.
Note, the correct dependency is satisfied via ACPI PMIC OpRegion driver,
which the Type-C depends on.
Fixes: 838d94f602e4 ("usb: typec: add driver for Intel Whiskey Cove PMIC USB Type-C PHY") Reported-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20220620104316.57592-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The call to check if the calibration is all zeroes is doing
it wrong: memchr_inv() returns NULL if the the calibration
contains all zeroes, but the check is for != NULL.
Fix it up. It's probably not an urgent fix because the inner
check for BIT(7) in data[13] will save us. But fix it.
Fixes: 9e5aa8e4e18b ("iio: magnetometer: Add driver for Yamaha YAS530") Reported-by: Jakob Hauser <jahau@rocketmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20220501195029.151852-1-linus.walleij@linaro.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Events CPU_CYCLES and INSTRUCTIONS can be submitted with two different
perf_event attribute::type values:
- PERF_TYPE_HARDWARE: when invoked via perf tool predefined events name
cycles or cpu-cycles or instructions.
- pmu->type: when invoked via perf tool event name cpu_cf/CPU_CYLCES/ or
cpu_cf/INSTRUCTIONS/. This invocation also selects the PMU to which
the event belongs.
Handle both type of invocations identical for events CPU_CYLCES and
INSTRUCTIONS. They address the same hardware.
The result is different when event modifier exclude_kernel is also set.
Invocation with event modifier for user space event counting fails.
This device shares the PCI ID with the Samsung 970 Evo Plus that
does not need or want the quirks. Move the the quirk entry to the
core table based on the model number instead.
This particular Kioxia device times out and aborts I/O during any load,
but it's more easily observable with discards (fstrim).
The device gets to a state that is also not possible to use
"nvme set-feature" to disable APST.
Booting with nvme_core.default_ps_max_latency=0 solves the issue.
We had a dozen or so of these devices behaving this same way in
customer environments.
Commit cfc433845f56 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
the new tcp_bpf_update_proto() function. I'm guessing that this
was done to allow creating psocks for non-inet sockets.
Unfortunately the destruction path for psock includes the ULP
unwind, so we need to fail the sk_psock_init() itself.
Otherwise if ULP is already present we'll notice that later,
and call tcp_update_ulp() with the sk_proto of the ULP
itself, which will most likely result in the ULP looping
its callbacks.
Fixes: cfc433845f56 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
This happens because virtnet_freeze() frees the receive_queue
completely (including struct xdp_rxq_info) but does not call
xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the
receive_queue again but does not call xdp_rxq_info_reg().
Actually, parts of virtnet_freeze_down() and virtnet_restore_up()
are almost identical to virtnet_close() and virtnet_open(): only
the calls to xdp_rxq_info_(un)reg() are missing. This means that
we can fix this easily and avoid such problems in the future by
just calling virtnet_close()/open() from the freeze/restore handlers.
Aside from adding the missing xdp_rxq_info calls the only difference
is that the refill work is only cancelled if netif_running(). However,
this should not make any functional difference since the refill work
should only be active if the network interface is actually up.
Fixes: 3a0a20296560 ("virtio_net: setup xdp_rxq_info") Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Intel I210 on some Intel Alder Lake platforms can only achieve ~750Mbps
Tx speed via iperf. The RR2DCDELAY shows around 0x2xxx DMA delay, which
will be significantly lower when 1) ASPM is disabled or 2) SoC package
c-state stays above PC3. When the RR2DCDELAY is around 0x1xxx the Tx
speed can reach to ~950Mbps.
According to the I210 datasheet "8.26.1 PCIe Misc. Register - PCIEMISC",
"DMA Idle Indication" doesn't seem to tie to DMA coalesce anymore, so
set it to 1b for "DMA is considered idle when there is no Rx or Tx AND
when there are no TLPs indicating that CPU is active detected on the
PCIe link (such as the host executes CSR or Configuration register read
or write operation)" and performing Tx should also fall under "active
CPU on PCIe link" case.
In addition to that, commit ea8945e8f37a ("igb: Move DMA Coalescing init
code to separate function.") seems to wrongly changed from enabling
E1000_PCIEMISC_LX_DECISION to disabling it, also fix that.
Fixes: ea8945e8f37a ("igb: Move DMA Coalescing init code to separate function.") Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220621221056.604304-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
We need to divide the sub-irq status register offset by register
stride to get an index for the status buffer to avoid an out of
bounds write when the register stride is greater than 1.
Fixes: 9d73ad8e526e ("regmap: regmap-irq: Add main status register support") Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Link: https://lore.kernel.org/r/20220620200644.1961936-3-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When enabling a type_in_mask irq, the type_buf contents must be
AND'd with the mask of the IRQ we're enabling to avoid enabling
other IRQs by accident, which can happen if several type_in_mask
irqs share a mask register.
Fixes: 9072dbe5f3e7 ("regmap: irq: handle HW using separate rising/falling edge interrupts") Signed-off-by: Aidan MacDonald <aidanmacdonald.0x0@gmail.com> Link: https://lore.kernel.org/r/20220620200644.1961936-2-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
In current implementation ice_update_phy_type enables all link modes
for selected speed. This approach doesn't work for 1000M speeds,
because both copper (1000baseT) and optical (1000baseX) standards
cannot be enabled at once.
Fix this, by adding the function `ice_set_phy_type_from_speed()`
for 1000M speeds.
Fixes: 6c429a84c56e ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The recent patch to make afs_getattr consult the server didn't account
for the pseudo-inodes employed by the dynamic root-type afs superblock
not having a volume or a server to access, and thus an oops occurs if
such a directory is stat'd.
Fix this by checking to see if the vnode->volume pointer actually points
anywhere before following it in afs_getattr().
This can be tested by stat'ing a directory in /afs. It may be
sufficient just to do "ls /afs" and the oops looks something like:
commit ab4fd97bb851 ("[PATCH] mips: update IRQ handling for vr41xx") added
a function irq_dispatch, and it'll increase irq_err_count when the get_irq
callback returns a negative value, but increase irq_err_count in get_irq
was not removed.
And also, modpost complains once gpio-vr41xx drivers become modules.
ERROR: modpost: "irq_err_count" [drivers/gpio/gpio-vr41xx.ko] undefined!
So it would be a good idea to remove repetitive increase irq_err_count in
get_irq callback.
Fixes: cc5be02f60c6 ("MIPS: Update VR41xx GPIO driver to use gpiolib") Fixes: ab4fd97bb851 ("[PATCH] mips: update IRQ handling for vr41xx") Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: huhai <huhai@kylinos.cn> Signed-off-by: Genjian Zhang <zhanggenjian@kylinos.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
When a VBD is not fully created and then closed, the kernel can have a
NULL pointer dereference:
The reproducer is trivial:
[user@dom0 ~]$ sudo xl block-attach work backend=sys-usb vdev=xvdi target=/dev/sdz
[user@dom0 ~]$ xl block-list work
Vdev BE handle state evt-ch ring-ref BE-path
51712 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51712
51728 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51728
51744 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51744
51760 0 241 4 -1 -1 /local/domain/0/backend/vbd/241/51760
51840 3 241 3 -1 -1 /local/domain/3/backend/vbd/241/51840
^ note state, the /dev/sdz doesn't exist in the backend
[user@dom0 ~]$ sudo xl block-detach work xvdi
[user@dom0 ~]$ xl block-list work
Vdev BE handle state evt-ch ring-ref BE-path
work is an invalid domain identifier
info->rq and info->gd are only set in blkfront_connect(), which is
called for state 4 (XenbusStateConnected). Guard against using NULL
variables in blkfront_closing() to avoid the issue.
The rest of blkfront_closing looks okay. If info->nr_rings is 0, then
for_each_rinfo won't do anything.
blkfront_remove also needs to check for non-NULL pointers before
cleaning up the gendisk and request queue.
Fixes: 76f585ba75a6 "xen-blkfront: sanitize the removal state machine" Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220601195341.28581-1-jandryuk@gmail.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Before change:
make -C netfilter
TEST: performance
net,port [SKIP]
perf not supported
port,net [SKIP]
perf not supported
net6,port [SKIP]
perf not supported
port,proto [SKIP]
perf not supported
net6,port,mac [SKIP]
perf not supported
net6,port,mac,proto [SKIP]
perf not supported
net,mac [SKIP]
perf not supported
After change:
net,mac [ OK ]
baseline (drop from netdev hook): 2061098pps
baseline hash (non-ranged entries): 1606741pps
baseline rbtree (match on first field only): 1191607pps
set with 1000 full, ranged entries: 1639119pps
ok 8 selftests: netfilter: nft_concat_range.sh
Fixes: 2a315bb252e0 ("selftests: netfilter: Introduce tests for sets with range concatenation") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jie2x Zhou <jie2x.zhou@intel.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
After setting the sock ktls, update ctx->sk_proto to sock->sk_prot by
tls_update(), so now ctx->sk_proto->close is tls_sk_proto_close(). When
close the sock, tls_sk_proto_close() is called for sock->sk_prot->close
is tls_sk_proto_close(). But ctx->sk_proto->close() will be executed later
in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
occurred. That will trigger the following bug.
=================================================================
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
Call Trace:
<TASK>
tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
__sock_release+0xcd/0x280 net/socket.c:650
sock_close+0x18/0x20 net/socket.c:1365
Updating a proto which is same with sock->sk_prot is incorrect. Add proto
and sock->sk_prot equality check at the head of tls_update() to fix it.
Fixes: 8e0fd2340797 ("bpf: sockmap/tls, close can race with map free") Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fixes: 0c5e1bb756ca ("erspan: auto detect truncated ipv6 packets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Except for memory load and store operations, ARM SPE records also can
support other operation types, bug when set the data source field the
current code assumes a record is a either load operation or store
operation, this leads to wrongly synthesize memory samples.
This patch strictly checks the record operation type, it only sets data
source only for the operation types ARM_SPE_LD and ARM_SPE_ST,
otherwise, returns zero for data source. Therefore, we can synthesize
memory samples only when data source is a non-zero value, the function
arm_spe__is_memory_event() is useless and removed.
Fixes: 28128b5db06cbc2d ("perf arm-spe: Synthesize memory event") Reviewed-by: Ali Saidi <alisaidi@amazon.com> Reviewed-by: German Gomez <german.gomez@arm.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Ali Saidi <alisaidi@amazon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: alisaidi@amazon.com Cc: Andrew Kilroy <andrew.kilroy@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Li Huafei <lihuafei1@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Forrington <nick.forrington@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Link: http://lore.kernel.org/lkml/20220517020326.18580-5-alisaidi@amazon.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Display resolution change is implemented through drm modeset. Older
modeset (resolution) has to be disabled first before newer modeset
(resolution) can be enabled. Display disable will turn off both
pixel clock and main link clock so that main link have to be
re-trained during display enable to have new video stream flow
again. At current implementation, display enable function manually
kicks up irq_hpd_handle which will read panel link status and start
link training if link status is not in sync state.
However, there is rare case that a particular panel links status keep
staying in sync for some period of time after main link had been shut
down previously at display disabled. In this case, main link retraining
will not be executed by irq_hdp_handle(). Hence video stream of newer
display resolution will fail to be transmitted to panel due to main
link is not in sync between host and panel.
This patch will bypass irq_hpd_handle() in favor of directly call
dp_ctrl_on_stream() to always perform link training in regardless of
main link status. So that no unexpected exception resolution change
failure cases will happen. Also this implementation are more efficient
than manual kicking off irq_hpd_handle function.
Changes in v2:
-- set force_link_train flag on DP only (is_edp == false)
Changes in v3:
-- revise commit text
-- add Fixes tag
Changes in v4:
-- revise commit text
Changes in v5:
-- fix spelling at commit text
Changes in v6:
-- split dp_ctrl_on_stream() for phy test case
-- revise commit text for modeset
Changes in v7:
-- drop 0 assignment at local variable (ret = 0)
Changes in v8:
-- add patch to remove pixel_rate from dp_ctrl
Changes in v9:
-- forward declare dp_ctrl_on_stream_phy_test_report()
Fixes: 25cbe2a47435 ("drm/msm/dp: fixes wrong connection state caused by failure of link train") Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/489895/ Link: https://lore.kernel.org/r/1655411200-7255-1-git-send-email-quic_khsieh@quicinc.com Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Current DP drivers have regulators, clocks, irq and phy are grouped
together within a function and executed not in a symmetric manner.
This increase difficulty of code maintenance and limited code scalability.
This patch divides the driver life cycle of operation into four states,
resume (including booting up), dongle plugin, dongle unplugged and suspend.
Regulators, core clocks and irq are grouped together and enabled at resume
(or booting up) so that the DP controller is armed and ready to receive HPD
plugin interrupts. HPD plugin interrupt is generated when a dongle plugs
into DUT (device under test). Once HPD plugin interrupt is received, DP
controller will initialize phy so that dpcd read/write will function and
following link training can be proceeded successfully. DP phy will be
disabled after main link is teared down at end of unplugged HPD interrupt
handle triggered by dongle unplugged out of DUT. Finally regulators, code
clocks and irq are disabled at corresponding suspension.
Changes in V2:
-- removed unnecessary dp_ctrl NULL check
-- removed unnecessary phy init_count and power_count DRM_DEBUG_DP logs
-- remove flip parameter out of dp_ctrl_irq_enable()
-- add fixes tag
Changes in V3:
-- call dp_display_host_phy_init() instead of dp_ctrl_phy_init() at
dp_display_host_init() for eDP
Changes in V4:
-- rewording commit text to match this commit changes
Changes in V5:
-- rebase on top of msm-next branch
Changes in V6:
-- delete flip variable
Changes in V7:
-- dp_ctrl_irq_enable/disabe() merged into dp_ctrl_reset_irq_ctrl()
Changes in V8:
-- add more detail comment regrading dp phy at dp_display_host_init()
Changes in V9:
-- remove set phy_initialized to false when -ECONNRESET detected
Changes in v10:
-- group into one series
Changes in v11:
-- drop drm/msm/dp: dp_link_parse_sink_count() return immediately
if aux read
Changes in v12:
-- move dp_display_host_phy_exit() after dp_display_host_deinit()
Changes in v13:
-- do not execute phy_init until plugged_in interrupt for edp, same as DP.
Changes in v14:
-- remove redundant dp->core_initialized = false form dp_pm_suspend.
Changes in v15:
-- remove core_initialized flag check at both host_init and host_deinit
Changes in v16:
-- remove dp_display_host_phy_exit core_initialized=false at dp_pm_suspend
Changes in v17:
-- remove core_initialized checking before execute attention_cb()
Changes in v18:
-- remove core_initialized checking at dp_pm_suspend
Add checking aux read/write status at both dp_link_parse_sink_count()
and dp_link_parse_sink_status_filed() to avoid long timeout delay if
dp aux read/write failed at timeout due to cable unplugged.
Changes in V4:
-- split this patch as stand alone patch
Changes in v5:
-- rebase on msm-next branch
Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Tested-by: Stephen Boyd <swboyd@chromium.org> Link: https://lore.kernel.org/r/1638985262-2072-1-git-send-email-quic_khsieh@quicinc.com Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
During msm initialize phase, dp_display_unbind() will be called to undo
initializations had been done by dp_display_bind() previously if there is
error happen at msm_drm_bind. In this case, core_initialized flag had to
be check to make sure clocks is on before update DP controller register
to disable HPD interrupts. Otherwise system will crash due to below NOC
fatal error.
changes in v2:
-- drop the first patch (drm/msm: enable msm irq after all initializations are done successfully at msm_drm_init()) since the problem had been fixed by other patch
Fixes: a37c829d3eb1 ("drm/msm/dp: stop event kernel thread when DP unbind") Signed-off-by: Kuogee Hsieh <quic_khsieh@quicinc.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/488387/ Link: https://lore.kernel.org/r/1654538139-7450-1-git-send-email-quic_khsieh@quicinc.com Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
of_graph_get_remote_node() returns remote device node pointer with
refcount incremented, we should use of_node_put() on it
when not need anymore.
Add missing of_node_put() to avoid refcount leak.
Fixes: 6bbe0e33a90b ("drm: convert drivers to use of_graph_get_remote_node") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Abhinav Kumar <quic_abhinavk@quicinc.com>
Patchwork: https://patchwork.freedesktop.org/patch/488473/ Link: https://lore.kernel.org/r/20220607110841.53889-1-linmq006@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
qopt.latency is __u32, psched_tdiff_t is signed long,
(psched_tdiff_t)(UINT_MAX) is negative for 32-bit platforms, so
qopt.latency is always UINT_MAX.
Fix it by using psched_time_t (u64) instead.
Note: confusingly, users have two ways to specify 'latency':
1. normally, via '__u32 latency' in struct tc_netem_qopt;
2. via the TCA_NETEM_LATENCY64 attribute, which is s64.
For the second case, theoretically 'latency' could be negative. This
patch ignores that corner case, since it is broken (i.e. assigning a
negative s64 to __u32) anyways, and should be handled separately.
Function fallback_set_params() checks if the module type returned
by a driver is ETH_MODULE_SFF_8079 and in this case it assumes
that buffer returns a concatenated content of page A0h and A2h.
The check is wrong because the correct type is ETH_MODULE_SFF_8472.
Fixes: 720ead11a7de ("ethtool: Add fallback to get_module_eeprom from netlink command") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220616160856.3623273-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The bonding ARP monitor fails to decrement send_peer_notif, the
number of peer notifications (gratuitous ARP or ND) to be sent. This
results in a continuous series of notifications.
Correct this by decrementing the counter for each notification.
Reported-by: Jonathan Toppins <jtoppins@redhat.com> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Fixes: ab0a4d932d7e ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor") Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/ Tested-by: Jonathan Toppins <jtoppins@redhat.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/9400.1655407960@famine Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix the following use-after-free bug in igb_clean_tx_ring routine when
the NIC is running in XDP mode. The issue can be triggered redirecting
traffic into the igb NIC and then closing the device while the traffic
is flowing.
syzbot found the following issue on:
==================================================================
BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
net/tipc/name_distr.c:413
Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
In the commit 09e2a56e204b ("tipc: fix a deadlock when flushing scheduled work"),
the cancel_work_sync() function just to make sure ONLY the work
tipc_net_finalize_work() is executing/pending on any CPU completed before
tipc namespace is destroyed through tipc_exit_net(). But this function
is not guaranteed the work is the last queued. So, the destroyed instance
may be accessed in the work which will try to enqueue later.
In order to completely fix, we re-order the calling of cancel_work_sync()
to make sure the work tipc_net_finalize_work() was last queued and it
must be completed by calling cancel_work_sync().
Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com Fixes: 09e2a56e204b ("tipc: fix a deadlock when flushing scheduled work") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.
All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.
Acquire the write lock of dev_base_lock without disabling bottom halves.
Reported-by: Pei Zhang <pezhang@redhat.com> Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 67a62a470b3c ("memblock: make memblock_find_in_range method
private") changed the API using which memory is reserved for the pKVM
hypervisor. However, memblock_phys_alloc() differs from the original API in
terms of kmemleak semantics -- the old one didn't report the reserved
regions to kmemleak while the new one does. Unfortunately, when protected
KVM is enabled, all kernel accesses to pKVM-private memory result in a
fatal exception, which can now happen because of kmemleak scans:
Even when the eth port is resticted to work with speeds not higher than 1G,
and so the eth driver is requesting the phy (via phylink) to advertise up
to 1000BASET support, the aquantia phy device is still advertising for 2.5G
and 5G speeds.
Clear these advertising defaults when requested.
Cc: Ondrej Spacek <ondrej.spacek@nxp.com> Fixes: a9894f007b7e1 ("net: phy: aquantia: add support for auto-negotiation configuration") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Current code is based on the idea that the max number of SGL entries
also determines the max size of an I/O request. While this idea was
true in older versions of the storvsc driver when SGL entry length
was limited to 4 Kbytes, commit 4afe14ec1b50 ("scsi: storvsc: Enable
scatterlist entry lengths > 4Kbytes") removed that limitation. It's
now theoretically possible for the block layer to send requests that
exceed the maximum size supported by Hyper-V. This problem doesn't
currently happen in practice because the block layer defaults to a
512 Kbyte maximum, while Hyper-V in Azure supports 2 Mbyte I/O sizes.
But some future configuration of Hyper-V could have a smaller max I/O
size, and the block layer could exceed that max.
Fix this by correctly setting max_sectors as well as sg_tablesize to
reflect the maximum I/O size that Hyper-V reports. While allowing
I/O sizes larger than the block layer default of 512 Kbytes doesn’t
provide any noticeable performance benefit in the tests we ran, it's
still appropriate to report the correct underlying Hyper-V capabilities
to the Linux block layer.
Also tweak the virt_boundary_mask to reflect that the required
alignment derives from Hyper-V communication using a 4 Kbyte page size,
and not on the guest page size, which might be bigger (eg. ARM64).
Link: https://lore.kernel.org/r/1655190355-28722-1-git-send-email-ssengar@linux.microsoft.com Fixes: 4afe14ec1b50 ("scsi: storvsc: Enable scatter list entry lengths > 4Kbytes") Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
On x86-64 the tail call count is passed from one BPF function to another
through %rax. Additionally, on function entry, the tail call count value
is stored on stack right after the BPF program stack, due to register
shortage.
The stored count is later loaded from stack either when performing a tail
call - to check if we have not reached the tail call limit - or before
calling another BPF function call in order to pass it via %rax.
In the latter case, we miscalculate the offset at which the tail call count
was stored on function entry. The JIT does not take into account that the
allocated BPF program stack is always a multiple of 8 on x86, while the
actual stack depth does not have to be.
This leads to a load from an offset that belongs to the BPF stack, as shown
in the example below:
SEC("tc")
int entry(struct __sk_buff *skb)
{
/* Have data on stack which size is not a multiple of 8 */
volatile char arr[1] = {};
return subprog_tail(skb);
}
If the component driver fails to bind, or is unbound, the driver data
for the top-level platform device points to a freed drm_device. If the
system is then suspended, the driver passes this dangling pointer to
drm_mode_config_helper_suspend(), which crashes.
Fix this by only setting the driver data while the platform driver holds
a reference to the drm_device.
Fixes: 7f543a094df8 ("drm: sun4i: Add support for suspending the display driver") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://lore.kernel.org/r/20220615054254.16352-1-samuel@sholland.org Signed-off-by: Sasha Levin <sashal@kernel.org>
A customer reported a request_socket leak in a Calico cloud environment. We
found that a BPF program was doing a socket lookup with takes a refcnt on
the socket and that it was finding the request_socket but returning the parent
LISTEN socket via sk_to_full_sk() without decrementing the child request socket
1st, resulting in request_sock slab object leak. This patch retains the
existing behaviour of returning full socks to the caller but it also decrements
the child request_socket if one is present before doing so to prevent the leak.
Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
thanks to Antoine Tenart for the reproducer and patch input.
v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
validate RCU flags on the listen socket so that it balances with bpf_sk_release()
and update comments as per Martin KaFai Lau's suggestion. One small change to
Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
instruction.
Fixes: c0ef450c4e84 ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()") Fixes: aa1f46455534 ("bpf: add skc_lookup_tcp helper") Co-developed-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Antoine Tenart <atenart@kernel.org> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Curtis Taylor <cutaylor-pub@yahoo.com> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The dma_map_sgtable() call (used to invalidate cache) overwrites sgt->nents
with 1, so msm_iommu_pagetable_map maps only the first physical segment.
To fix this problem use for_each_sgtable_sg(), which uses orig_nents.
Fixes: 3e9f68ad0a3d ("drm/msm: Add support to create a local pagetable") Signed-off-by: Jonathan Marek <jonathan@marek.ca> Link: https://lore.kernel.org/r/20220613221019.11399-1-jonathan@marek.ca Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Originally the cq reservation was performed first, followed by the skb
allocation. Commit 74bed4e284d7 ("xdp: fix possible cq entry leak")
reversed the order because at the time there was no mechanism available
to undo the cq reservation which could have led to possible cq entry leaks
in the event of skb allocation failure. However if the skb allocation is
performed first and the cq reservation then fails, the xsk skb destructor
is called which blindly adds the skb address to the already full cq leading
to undefined behavior.
This commit restores the original order (cq reservation followed by skb
allocation) and uses the xskq_prod_cancel helper to undo the cq reserve
in event of skb allocation failure.
Fixes: 74bed4e284d7 ("xdp: fix possible cq entry leak") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20220614070746.8871-1-ciara.loftus@intel.com Signed-off-by: Sasha Levin <sashal@kernel.org>
The kernel returns an endpoint ID as r.ep_connect_ret.handle in the
iscsi_uevent. The iscsid validates a received endpoint ID and treats zero
as an error. The commit referenced in the fixes line changed the endpoint
ID range, and zero is always assigned to the first endpoint ID. So, the
first attempt to create a new iSER connection always fails.
Link: https://lore.kernel.org/r/20220613123854.55073-1-sergeygo@nvidia.com Fixes: 109831bd0572 ("scsi: iscsi: Release endpoint ID when its freed") Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Sergey Gorenko <sergeygo@nvidia.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
In msm_devfreq_suspend() we cancel idle_work synchronously so that it
doesn't run after we power of the hw or in the resume path. But this
means that we want to ensure that idle_work is not scheduled *after* we
no longer hold a runpm ref. So switch the ordering of pm_runtime_put()
vs msm_devfreq_idle().
v2. Only move the runpm _put_autosuspend, and not the _mark_last_busy()
When a write command to a sequential write required or sequential write
preferred zone result in the zone write pointer reaching the end of the
zone, the zone condition must be set to full AND the number of implicitly
or explicitly open zones updated to have a correct accounting for zone
resources. However, the function zbc_inc_wp() only sets the zone condition
to full without updating the open zone counters, resulting in a zone state
machine breakage.
Introduce the helper function zbc_set_zone_full() and use it in
zbc_inc_wp() to correctly transition zones to the full condition.
Link: https://lore.kernel.org/r/20220608011302.92061-1-damien.lemoal@opensource.wdc.com Fixes: 6a87ea528821 ("scsi: scsi_debug: Add ZBC zone commands") Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
bh might occur while updating per-cpu rnd_state from user context,
ie. local_out path.
BUG: using smp_processor_id() in preemptible [00000000] code: nginx/2725
caller is nft_ng_random_eval+0x24/0x54 [nft_numgen]
Call Trace:
check_preemption_disabled+0xde/0xe0
nft_ng_random_eval+0x24/0x54 [nft_numgen]
Use the random driver instead, this also avoids need for local prandom
state. Moreover, prandom now uses the random driver since c009001b3ca7
("random32: use real rng for non-deterministic randomness").
Based on earlier patch from Pablo Neira.
Fixes: fbdcd787eba3 ("netfilter: nft_meta: place prandom handling in a helper") Fixes: 88be92da0192 ("netfilter: nft_numgen: add map lookups for numgen random operations") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Following commit bc038d79801a ("drm/msm: fix unbalanced
pm_runtime_enable in adreno_gpu_{init, cleanup}"), any call to
adreno_unbind() will disable runtime PM twice, as indicated by the call
trees below:
Note that pm_runtime_force_suspend() is called right before
gpu->funcs->destroy() and both functions are called unconditionally.
With recent addition of the eDP AUX bus code, this problem manifests
itself when the eDP panel cannot be found yet and probing is deferred.
On the first probe attempt, we disable runtime PM twice as described
above. This then causes any later probe attempt to fail with
[drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -13
preventing the driver from loading.
As there seem to be scenarios where the aNxx_destroy() functions are not
called from adreno_unbind(), simply removing pm_runtime_disable() from
inside adreno_unbind() does not seem to be the proper fix. This is what
commit bc038d79801a ("drm/msm: fix unbalanced pm_runtime_enable in
adreno_gpu_{init, cleanup}") intended to fix. Therefore, instead check
whether runtime PM is still enabled, and only disable it in that case.
Fixes: bc038d79801a ("drm/msm: fix unbalanced pm_runtime_enable in adreno_gpu_{init, cleanup}") Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Tested-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Rob Clark <robdclark@gmail.com> Link: https://lore.kernel.org/r/20220606211305.189585-1-luzmaximilian@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If a GEM object is allocated, and then exported as a dma-buf fd which is
mmap'd before or without the GEM buffer being directly mmap'd, the
vma_node could be unitialized. This leads to a situation where the CPU
mapping is not correctly torn down in drm_vma_node_unmap().
Fixes: ad5d5b65086b ("drm: call drm_gem_object_funcs.mmap with fake offset") Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20220531200857.136547-1-robdclark@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
RM500K provides 2 mandatory interfaces to Linux host after enumeration.
- /dev/ttyUSB5: this is a serial interface for control path. User needs
to write AT commands to this device node to query status, set APN,
set PIN code, and enable/disable the data connection to 5G network.
- ethX: this is the data path provided as a RNDIS devices. After the
data connection has been established, Linux host can access 5G data
network via this interface.
"RNDIS": RNDIS + ADB + AT (/dev/ttyUSB5) + MODEM COMs
The EM05-G modem has 2 USB configurations that are configurable via the AT
command AT+QCFG="usbnet",[ 0 | 2 ] which make the modem enumerate with
the following interfaces, respectively:
Add support for further HXN (G) type devices (GT variant, GL variant, GS
variant and GR) and document the bcdDevice mapping.
Note that the TA and TB types use the same bcdDevice as some GT and GE
variants, respectively, but that the HX status request can be used to
determine which is which.
Also note that we currently do not distinguish between the various HXN
(G) types in the driver but that this may change eventually (e.g. when
adding GPIO support).
There is a small chance that get_kretprobe(ri) returns NULL in
kretprobe_dispatcher() when another CPU unregisters the kretprobe
right after __kretprobe_trampoline_handler().
To avoid this issue, kretprobe_dispatcher() checks the get_kretprobe()
return value again. And if it is NULL, it returns soon because that
kretprobe is under unregistering process.
This issue has been introduced when the kretprobe is decoupled
from the struct kretprobe_instance by commit 4fa79739c229
("kprobes: Remove kretprobe hash"). Before that commit, the
struct kretprob_instance::rp directly points the kretprobe
and it is never be NULL.
Link: https://lkml.kernel.org/r/165366693881.797669.16926184644089588731.stgit@devnote2 Reported-by: Yonghong Song <yhs@fb.com> Fixes: 4fa79739c229 ("kprobes: Remove kretprobe hash") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: bpf <bpf@vger.kernel.org> Cc: Kernel Team <kernel-team@fb.com> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit f569a3ade581 ("dm mirror log: round up region bitmap size to
BITS_PER_LONG") introduced a regression on 64-bit architectures in the
lvm testsuite tests: lvcreate-mirror, mirror-names and vgsplit-operation.
If the device is shrunk, we need to clear log bits beyond the end of the
device. The code clears bits up to a 32-bit boundary and then calculates
lc->sync_count by summing set bits up to a 64-bit boundary (the commit
changed that; previously, this boundary was 32-bit too). So, it was using
some non-zeroed bits in the calculation and this caused misbehavior.
Fix this regression by clearing bits up to BITS_PER_LONG boundary.
Fixes: f569a3ade581 ("dm mirror log: round up region bitmap size to BITS_PER_LONG") Cc: stable@vger.kernel.org Reported-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1. Archives the current era
2. Commits the metadata, as part of the RPC call for archiving the
current era
3. Stops the worker
Until the worker stops, it might write to the metadata again. Moreover,
these writes are not flushed to disk immediately, but are cached by the
dm-bufio client, which writes them back asynchronously.
As a result, the committed metadata of a suspended dm-era device might
not be consistent with the in-core metadata.
In some cases, this can result in the corruption of the on-disk
metadata. Suppose the following sequence of events:
1. Load a new table, e.g. a snapshot-origin table, to a device with a
dm-era table
2. Suspend the device
3. dm-era commits its metadata, but the worker does a few more metadata
writes until it stops, as part of digesting an archived writeset
4. These writes are cached by the dm-bufio client
5. Load the dm-era table to another device.
6. The new instance of the dm-era target loads the committed, on-disk
metadata, which don't include the extra writes done by the worker
after the metadata commit.
7. Resume the new device
8. The new dm-era target instance starts using the metadata
9. Resume the original device
10. The destructor of the old dm-era target instance is called and
destroys the dm-bufio client, which results in flushing the cached
writes to disk
11. These writes might overwrite the writes done by the new dm-era
instance, hence corrupting its metadata.
Fix this by committing the metadata after the worker stops running.
stop_worker uses flush_workqueue to flush the current work. However, the
work item may re-queue itself and flush_workqueue doesn't wait for
re-queued works to finish.
This could result in the worker changing the metadata after they have
been committed, or writing to the metadata concurrently with the commit
in the postsuspend thread.
Use drain_workqueue instead, which waits until the work and all
re-queued works finish.
The DEVICE_BUSY_TIMEOUT value is described in the Reference Manual as:
| Timeout waiting for NAND Ready/Busy or ATA IRQ. Used in WAIT_FOR_READY
| mode. This value is the number of GPMI_CLK cycles multiplied by 4096.
So instead of multiplying the value in cycles with 4096, we have to
divide it by that value. Use DIV_ROUND_UP to make sure we are on the
safe side, especially when the calculated value in cycles is smaller
than 4096 as typically the case.
This bug likely never triggered because any timeout != 0 usually will
do. In my case the busy timeout in cycles was originally calculated as
2408, which multiplied with 4096 is 0x968000. The lower 16 bits were
taken for the 16 bit wide register field, so the register value was
0x8000. With 607663bc3b1d ("mtd: rawnand: gpmi: fix controller timings
setting") however the value in cycles became 2384, which multiplied
with 4096 is 0x950000. The lower 16 bit are 0x0 now resulting in an
intermediate timeout when reading from NAND.
Fixes: 16c1b23db3788 ("mtd: rawnand: gpmi: use core timings instead of an empirical derivation") Cc: stable@vger.kernel.org Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://lore.kernel.org/linux-mtd/20220614083138.3455683-1-s.hauer@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The IOMMU mailing list will move from lists.linux-foundation.org to
lists.linux.dev. The hard switch of the archive will happen on July
5th, but add the new list now already so that people start using the
list when sending patches. After July 5th the old list will disappear.
unmap_grant_pages() currently waits for the pages to no longer be used.
In https://github.com/QubesOS/qubes-issues/issues/7481, this lead to a
deadlock against i915: i915 was waiting for gntdev's MMU notifier to
finish, while gntdev was waiting for i915 to free its pages. I also
believe this is responsible for various deadlocks I have experienced in
the past.
Avoid these problems by making unmap_grant_pages async. This requires
making it return void, as any errors will not be available when the
function returns. Fortunately, the only use of the return value is a
WARN_ON(), which can be replaced by a WARN_ON when the error is
detected. Additionally, a failed call will not prevent further calls
from being made, but this is harmless.
Because unmap_grant_pages is now async, the grant handle will be sent to
INVALID_GRANT_HANDLE too late to prevent multiple unmaps of the same
handle. Instead, a separate bool array is allocated for this purpose.
This wastes memory, but stuffing this information in padding bytes is
too fragile. Furthermore, it is necessary to grab a reference to the
map before making the asynchronous call, and release the reference when
the call returns.
It is also necessary to guard against reentrancy in gntdev_map_put(),
and to handle the case where userspace tries to map a mapping whose
contents have not all been freed yet.
Fixes: c671096378af ("xen/gntdev: safely unmap grants in case they are still in use") Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220622022726.2538-1-demi@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
MediaTek IP requires that after dma stop, it need to wait this dma stop
bit auto-reset to 0. When bus is in high loading state, it will take a
while for the dma stop complete. If there is no waiting operation here,
when program runs to clear fifo and reset, bus will hang.
In addition, there should be no return in msdc_data_xfer_next() if
there is data need be transferred, because no matter what error occurs
here, it should continue to excute to the following mmc_request_done.
Otherwise the core layer may wait complete forever.
Currently, the sub-queues and event pool resources are allocated/freed for
every CRQ connection event such as reset and LPM. This exposes the driver
to a couple issues. First the inefficiency of freeing and reallocating
memory that can simply be resued after being sanitized. Further, a system
under memory pressue runs the risk of allocation failures that could result
in a crippled driver. Finally, there is a race window where command
submission/compeletion can try to pull/return elements from/to an event
pool that is being deleted or already has been deleted due to the lack of
host state around freeing/allocating resources. The following is an example
of list corruption following a live partition migration (LPM):
Currently the back pointer from a queue to the vhost adapter isn't set
until after subcrq interrupt registration. The value is available when a
queue is first allocated and can/should be also set for primary and async
queues as well as subcrqs.
This fixes a crash observed during kexec/kdump on Power 9 with legacy XICS
interrupt controller where a pending subcrq interrupt from the previous
kernel can be replayed immediately upon IRQ registration resulting in
dereference of a garbage backpointer in ibmvfc_interrupt_scsi().
Almost none of the errors stemming from a valid mount option but wrong
value prints a descriptive message which would help to identify why
mount failed. Like in the linked report:
$ uname -r
v4.19
$ mount -o compress=zstd /dev/sdb /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on
/dev/sdb, missing codepage or helper program, or other error.
$ dmesg
...
BTRFS error (device sdb): open_ctree failed
Errors caused by memory allocation failures are left out as it's not a
user error so reporting that would be confusing.
Upstream commit e429df2748ef ("btrfs: force v2 space cache usage for
subpage mount") forces subpage mount to use v2 cache, to avoid
deprecated v1 cache which doesn't support subpage properly.
But there is a loophole that user can still remount to v1 cache.
The existing check will only give users a warning, but does not really
prevent to do the remount.
Although remounting to v1 will not cause any problems since the v1 cache
will always be marked invalid when mounted with a different page size,
it's still better to prevent v1 cache at all for subpage mounts.
Fixes: e429df2748ef ("btrfs: force v2 space cache usage for subpage mount") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we start an unmount, at close_ctree(), if we have the reclaim task
running and in the middle of a data block group relocation, we can trigger
a deadlock when stopping an async reclaim task, producing a trace like the
following:
1) Before entering close_ctree() we have the async block group reclaim
task running and relocating a data block group;
2) There's an async metadata (or data) space reclaim task running;
3) We enter close_ctree() and park the cleaner kthread;
4) The async space reclaim task is at flush_space() and runs all the
existing delayed iputs;
5) Before the async space reclaim task calls
btrfs_wait_on_delayed_iputs(), the block group reclaim task which is
doing the data block group relocation, creates a delayed iput at
replace_file_extents() (called when COWing leaves that have file extent
items pointing to relocated data extents, during the merging phase
of relocation roots);
6) The async reclaim space reclaim task blocks at
btrfs_wait_on_delayed_iputs(), since we have a new delayed iput;
7) The task at close_ctree() then calls cancel_work_sync() to stop the
async space reclaim task, but it blocks since that task is waiting for
the delayed iput to be run;
8) The delayed iput is never run because the cleaner kthread is parked,
and no one else runs delayed iputs, resulting in a hang.
So fix this by stopping the async block group reclaim task before we
park the cleaner kthread.
Decrement the refcount of the parent dentry's fid after walking
each path component during a full path walk for a lookup. Failure to do
so can lead to fids that are not clunked until the filesystem is
unmounted, as indicated by this warning:
9pnet: found fid 3 not clunked
The improper refcounting after walking resulted in open(2) returning
-EIO on any directories underneath the mount point when using the virtio
transport. When using the fd transport, there's no apparent issue until
the filesytem is unmounted and the warning above is emitted to the logs.
In some cases, the user may not yet be attached to the filesystem and a
new root fid, associated with the user, is created and attached to the
root dentry before the full path walk is performed. Increment the new
root fid's refcount to two in that situation so that it can be safely
decremented to one after it is used for the walk operation. The new fid
will still be attached to the root dentry when
v9fs_fid_lookup_with_uid() returns so a final refcount of one is
correct/expected.
When a packet enters the OVS datapath and does not match any existing
flows installed in the kernel flow cache, the packet will be sent to
userspace to be parsed, and a new flow will be created. The kernel and
OVS rely on each other to parse packet fields in the same way so that
packets will be handled properly.
As per the design document linked below, OVS expects all later IPv6
fragments to have nw_proto=44 in the flow key, so they can be correctly
matched on OpenFlow rules. OpenFlow controllers create pipelines based
on this design.
This behavior was changed by the commit in the Fixes tag so that
nw_proto equals the next_header field of the last extension header.
However, there is no counterpart for this change in OVS userspace,
meaning that this field is parsed differently between OVS and the
kernel. This is a problem because OVS creates actions based on what is
parsed in userspace, but the kernel-provided flow key is used as a match
criteria, as described in Documentation/networking/openvswitch.rst. This
leads to issues such as packets incorrectly matching on a flow and thus
the wrong list of actions being applied to the packet. Such changes in
packet parsing cannot be implemented without breaking the userspace.
The offending commit is partially reverted to restore the expected
behavior.
The change technically made sense and there is a good reason that it was
implemented, but it does not comply with the original design of OVS.
If in the future someone wants to implement such a change, then it must
be user-configurable and disabled by default to preserve backwards
compatibility with existing OVS versions.