The link state is set to false if there is no link and local sink. Even
though the stream state may not change, it is desirable to commit the
new stream when HPD goes low to high.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Co-developed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 59de751e3845 ("drm/amd/display: add ODM case when looking for first split pipe") Signed-off-by: Sasha Levin <sashal@kernel.org>
Currently, userspace doesn't have a way to communicate selective updates
to displays. So, enable support for FB_DAMAGE_CLIPS for DCN ASICs newer
than DCN301, convert DRM damage clips to dc dirty rectangles and fill
them into dirty_rects in fill_dc_dirty_rects().
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Stable-dep-of: 72f1de49ffb9 ("drm/dp_mst: Clear MSG_RDY flag before sending new message") Signed-off-by: Sasha Levin <sashal@kernel.org>
The RK3399 PCIe endpoint controller cannot generate MSI-X IRQs.
This is documented in the RK3399 technical reference manual (TRM)
section 17.5.9 "Interrupt Support".
MSI-X capability should therefore not be advertised. Remove the
MSI-X capability by editing the capability linked-list. The
previous entry is the MSI capability, therefore get the next
entry from the MSI-X capability entry and set it as next entry
for the MSI capability. This in effect removes MSI-X from the list.
Linked list before : MSI cap -> MSI-X cap -> PCIe Device cap -> ...
Linked list now : MSI cap -> PCIe Device cap -> ...
Link: https://lore.kernel.org/r/20230418074700.1083505-11-rick.wertenbroek@gmail.com Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
The RK3399 PCI endpoint core has 33 windows for PCIe space, now in the
driver up to 32 fixed size (1M) windows are used and pages are allocated
and mapped accordingly. The driver first used a single window and allocated
space inside which caused translation issues (between CPU space and PCI
space) because a window can only have a single translation at a given
time, which if multiple pages are allocated inside will cause conflicts.
Now each window is a single region of 1M which will always guarantee that
the translation is not in conflict.
Set the translation register addresses for physical function. As documented
in the technical reference manual (TRM) section 17.5.5 "PCIe Address
Translation" and section 17.6.8 "Address Translation Registers Description"
Link: https://lore.kernel.org/r/20230418074700.1083505-9-rick.wertenbroek@gmail.com Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
Remove write accesses to registers that are marked "unused" (and
therefore read-only) in the technical reference manual (TRM)
(see RK3399 TRM 17.6.8.1)
Link: https://lore.kernel.org/r/20230418074700.1083505-2-rick.wertenbroek@gmail.com Tested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Stable-dep-of: dc73ed0f1b8b ("PCI: rockchip: Fix window mapping and address translation for endpoint") Signed-off-by: Sasha Levin <sashal@kernel.org>
PCIe r6.0.1, sec 7.5.3.7, recommends setting the link control parameters,
then waiting for the Link Training bit to be clear before setting the
Retrain Link bit.
This avoids a race where the LTSSM may not use the updated parameters if it
is already in the midst of link training because of other normal link
activity.
Wait for the Link Training bit to be clear before toggling the Retrain Link
bit to ensure that the LTSSM uses the updated link control parameters.
[bhelgaas: commit log, return 0 (success)/-ETIMEDOUT instead of bool for
both pcie_wait_for_retrain() and the existing pcie_retrain_link()] Suggested-by: Lukas Wunner <lukas@wunner.de> Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support") Link: https://lore.kernel.org/r/20230502083923.34562-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
"pcie_retrain_link" is not a question with a true/false answer, so "bool"
isn't quite the right return type. Return 0 for success or -ETIMEDOUT if
the retrain failed. No functional change intended.
After commit 0e96ea5c3eb5904e5dc2f ("MIPS: Loongson64: Clean up use of
cc-ifversion") we get a build error when make modules_install:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
The reason is when make modules_install, 'call cc-option' doesn't work
in $(KBUILD_CFLAGS) of 'CHECKFLAGS'. Then there is no -mno-loongson-mmi
applied and -march=loongson3a enable MMI instructions.
To be detail, the error message comes from the CHECKFLAGS invocation of
$(CC) but it has no impact on the final result of make modules_install,
it is purely a cosmetic issue. The error occurs because cc-option is
defined in scripts/Makefile.compiler, which is not included in Makefile
when running 'make modules_install', as install targets are not supposed
to require the compiler; see commit 805b2e1d427aab4b ("kbuild: include
Makefile.compiler only when compiler is needed"). As a result, the call
to check for '-mno-loongson-mmi' just never happens.
Fix this by partially reverting to the old logic, use 'call cc-option'
to conditionally apply -march=loongson3a and -march=mips64r2.
By the way, Loongson-2E/2F is also broken in commit 13ceb48bc19c563e05f4
("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls") so fix it
together.
Fixes: 13ceb48bc19c563e05f4 ("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls") Fixes: 0e96ea5c3eb5904e5dc2 ("MIPS: Loongson64: Clean up use of cc-ifversion") Cc: stable@vger.kernel.org Cc: Feiyang Chen <chenfeiyang@loongson.cn> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding
size determination a bit safer according to the Linux coding style
convention.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Stable-dep-of: 05f933d5f731 ("i2c: nomadik: Remove a useless call in the remove function") Signed-off-by: Sasha Levin <sashal@kernel.org>
These issues were detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Wolfram Sang <wsa@kernel.org>
Stable-dep-of: 05f933d5f731 ("i2c: nomadik: Remove a useless call in the remove function") Signed-off-by: Sasha Levin <sashal@kernel.org>
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
# --- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
# +++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
# @@ -1,2 +1,4 @@
# QA output created by 255
# +ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
# +There may be more info in syslog - try dmesg | tail
# Silence is golden
# ...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes") CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Uwe Kleine-König pointed out we still have one resource leak in the mvebu
driver triggered on driver detach. Let's address it with a custom devm
action.
This allows to get rid of a call to pwmchip_remove() in the error path. There
is no .remove function for this driver, so this change fixes a resource leak
when a gpio-mvebu device is unbound.
The devm_pwmchip_add() can be called by a module that optionally
instantiates PWM chip. In the case of CONFIG_PWM=n, the compilation
can't be performed. Hence, add a necessary stub.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Thierry Reding <thierry.reding@gmail.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make tps68470_gpio_output() call tps68470_gpio_set() for output-only pins
too, so that the initial value passed to gpiod_direction_output() is
honored for these pins too.
Fixes: 275b13a65547 ("gpio: Add support for TPS68470 GPIOs") Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Tested-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The check being unconditional may lead to unwanted denials reported by
LSMs when a process has the capability granted by DAC, but denied by an
LSM. In the case of SELinux such denials are a problem, since they can't
be effectively filtered out via the policy and when not silenced, they
produce noise that may hide a true problem or an attack.
Since not having the capability merely means that the created io_uring
context will be accounted against the current user's RLIMIT_MEMLOCK
limit, we can disable auditing of denials for this check by using
ns_capable_noaudit() instead of capable().
The index field of the struct page corresponding to a guest ASCE should
be 0. When replacing the ASCE in s390_replace_asce(), the index of the
new ASCE should also be set to 0.
Having the wrong index might lead to the wrong addresses being passed
around when notifying pte invalidations, and eventually to validity
intercepts (VM crash) if the prefix gets unmapped and the notifier gets
called with the wrong address.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Fixes: faa2f72cb356 ("KVM: s390: pv: leak the topmost page table when destroy fails") Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20230705111937.33472-3-imbrenda@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Commit 8ef7b9e1765a ("powerpc/pseries/vas: Close windows with DLPAR
core removal") unmaps the window paste address and issues HCALL to
close window in the hypervisor for migration or DLPAR core removal
events. So holds mmap_mutex and then mmap lock before unmap the
paste address. But if the user space issue mmap paste address at
the same time with the migration event, coproc_mmap() is called
after holding the mmap lock which can trigger deadlock when trying
to acquire mmap_mutex in coproc_mmap().
Fix this deadlock issue by holding mmap lock first before mmap_mutex
in reconfig_close_windows().
Fixes: 8ef7b9e1765a ("powerpc/pseries/vas: Close windows with DLPAR core removal") Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230716100506.7833-1-haren@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
* blk_mq_plug_issue_direct() is entered with an mq_list containing two
requests.
* For the first request, it sets last == false and enters the driver's
queue_rq callback.
* The driver queue_rq callback indirectly calls schedule() which calls
blk_flush_plug(). This may happen if the driver has the
BLK_MQ_F_BLOCKING flag set and is allowed to sleep in ->queue_rq.
* blk_flush_plug() handles the remaining request in the mq_list. mq_list
is now empty.
* The original call to queue_rq resumes (with last == false).
* The loop in blk_mq_plug_issue_direct() terminates because there are no
remaining requests in mq_list.
The IO is now stalled because the last request submitted to the driver
had last == false and there was no subsequent call to commit_rqs().
Fix this by returning early in blk_mq_flush_plug_list() if rq_count is 0
which it will be in the recursive case, rather than checking if the
mq_list is empty. At the same time, adjust one of the callers to skip
the mq_list empty check as it is not necessary.
Fixes: dc5fc361d891 ("block: attempt direct issue of plug list") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230714101106.3635611-1-ross.lagerwall@citrix.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
P1 wb_workfn
jbd2_log_do_checkpoint
if (buffer_locked(bh)) // false
__block_write_full_page
trylock_buffer(bh)
test_clear_buffer_dirty(bh)
if (!buffer_dirty(bh))
__jbd2_journal_remove_checkpoint(jh)
if (buffer_write_io_error(bh)) // false
>> bh IO error occurs <<
jbd2_cleanup_journal_tail
__jbd2_update_log_tail
jbd2_write_superblock
// The bh won't be replayed in next mount.
, which could corrupt the ext4 image, fetch a reproducer in [Link].
Since writeback process clears buffer dirty after locking buffer head,
we can fix it by try locking buffer and check dirtiness while buffer is
locked, the buffer head can be removed if it is neither dirty nor locked.
SMU13 overrides dynamic PCIe lane width and dynamic speed by when on
certain hosts. commit 38e4ced80479 ("drm/amd/pm: conditionally disable
pcie lane switching for some sienna_cichlid SKUs") worked around this
issue by setting up certain SKUs to set up certain limits, but the same
fundamental problem with those hosts affects all SMU11 implmentations
as well, so align the SMU11 and SMU13 driver handling.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
apmf_get_system_params() failure is not a critical event, reduce its
verbosity from dev_err to dev_dbg.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20230714144435.1239776-1-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
APMF fn8 can notify EC about the OS slider position change. Add this
capability to the PMF driver so that it can call the APMF fn8 based on
the changes in the Platform profile events.
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Patil Rajesh Reddy <Patil.Reddy@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Link: https://lore.kernel.org/r/20230714144435.1239776-2-Shyam-sundar.S-k@amd.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Set element addition error path decrements reference counter on chains
twice: once on element release and again via nft_data_release().
Then, d6b478666ffa ("netfilter: nf_tables: fix underflow in object
reference counter") incorrectly fixed this by removing the stateful
object reference count decrement.
Restore the stateful object decrement as in b91d90368837 ("netfilter:
nf_tables: fix leaking object reference count") and let
nft_data_release() decrement the chain reference counter, so this is
done only once.
Fixes: d6b478666ffa ("netfilter: nf_tables: fix underflow in object reference counter") Fixes: 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since ("netfilter: nf_tables: drop map element references from
preparation phase"), integration with commit protocol is better,
therefore drop the workaround that b91d90368837 ("netfilter: nf_tables:
fix leaking object reference count") provides.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This change causes regression when eDP and external display in mirror
mode. When external display supports low resolution than eDP, use eDP
timing to driver external display may cause corruption on external
display.
[Why]
Specific TBT4 dock doesn't send out short HPD to notify source
that IRQ event DOWN_REP_MSG_RDY is set. Which violates the spec
and cause source can't send out streams to mst sinks.
[How]
To cover this misbehavior, add an additional polling method to detect
DOWN_REP_MSG_RDY is set. HPD driven handling method is still kept.
Just hook up our handler to drm mgr->cbs->poll_hpd_irq().
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Acked-by: Alan Liu <haoping.liu@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
linux amdgpu defer handle link lost irq. dm add handle
request to irq work queue for the first irq of link lost.
if link training fails for link lost handle, link will not
be enabled anymore.
[How]
allow adding handle request of link lost to work queue
before running dp link training for link lost.
Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ Modified due to not having c5a31f178e352 ("drm/amd/display: move dp irq handler functions from dc_link_dp to link_dp_irq_handler")
until kernel 6.3-rc1.] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the following errors & warnings reported by checkpatch:
ERROR: space required before the open brace '{'
ERROR: space required before the open parenthesis '('
ERROR: that open brace { should be on the previous line
ERROR: space prohibited before that ',' (ctx:WxW)
ERROR: else should follow close brace '}'
ERROR: open brace '{' following function definitions go on the next line
ERROR: code indent should use tabs where possible
WARNING: braces {} are not necessary for single statement blocks
WARNING: void function return statements are not generally useful
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ Modified for missing c5a31f178e35 ("drm/amd/display: move dp irq handler functions from dc_link_dp to link_dp_irq_handler")
which landed in 6.3] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
During DP DSC compliance tests, bpc requested would
change between sub-tests, which requires stream
to be recommited.
[How]
Force connector to disconnect and reconnect whenever
there is a bpc change in automated test.
Reviewed-by: Jerry Zuo <Jerry.Zuo@amd.com> Acked-by: Alan Liu <HaoPing.Liu@amd.com> Signed-off-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: hersen wu <hersenxs.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ Adjustments for headers that were moved around in later commits. ] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
The sequence for collecting down_reply from source perspective should
be:
Request_n->repeat (get partial reply of Request_n->clear message ready
flag to ack DPRX that the message is received) till all partial
replies for Request_n are received->new Request_n+1.
Now there is chance that drm_dp_mst_hpd_irq() will fire new down
request in the tx queue when the down reply is incomplete. Source is
restricted to generate interveleaved message transactions so we should
avoid it.
Also, while assembling partial reply packets, reading out DPCD DOWN_REP
Sideband MSG buffer + clearing DOWN_REP_MSG_RDY flag should be
wrapped up as a complete operation for reading out a reply packet.
Kicking off a new request before clearing DOWN_REP_MSG_RDY flag might
be risky. e.g. If the reply of the new request has overwritten the
DPRX DOWN_REP Sideband MSG buffer before source writing one to clear
DOWN_REP_MSG_RDY flag, source then unintentionally flushes the reply
for the new request. Should handle the up request in the same way.
[How]
Separete drm_dp_mst_hpd_irq() into 2 steps. After acking the MST IRQ
event, driver calls drm_dp_mst_hpd_irq_send_new_request() and might
trigger drm_dp_mst_kick_tx() only when there is no on going message
transaction.
Changes since v1:
* Reworked on review comments received
-> Adjust the fix to let driver explicitly kick off new down request
when mst irq event is handled and acked
-> Adjust the commit message
Changes since v2:
* Adjust the commit message
* Adjust the naming of the divided 2 functions and add a new input
parameter "ack".
* Adjust code flow as per review comments.
Changes since v3:
* Update the function description of drm_dp_mst_hpd_irq_handle_event
Changes since v4:
* Change ack of drm_dp_mst_hpd_irq_handle_event() to be an array align
the size of esi[]
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix the following checkpatch checks in amdgpu_dm.c
CHECK: Prefer kernel type 'u8' over 'uint8_t'
CHECK: Prefer kernel type 'u32' over 'uint32_t'
CHECK: Prefer kernel type 'u64' over 'uint64_t'
CHECK: Prefer kernel type 's32' over 'int32_t'
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[ PSR-SU support was introduced in kernel 6.2 with commits like 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
but PSR-SU isn't enabled in 6.1.y, so this block needs to be skipped
when backporting. ] Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since, the quirk is handled in the DRM core now, we can use that value
instead of the internal value.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sk_assign is failing on an s390x machine running Debian "bookworm" for
2 reasons: legacy server_map definition and uninitialized addrlen in
recvfrom() call.
Fix by adding a new-style server_map definition and dropping addrlen
(recvfrom() allows NULL values for src_addr and addrlen).
Since the test should support tc built without libbpf, build the prog
twice: with the old-style definition and with the new-style definition,
then select the right one at runtime. This could be done at compile
time too, but this would not be cross-compilation friendly.
Insn 180 is a call to 'do_bind'. The call's return value is also the return value
for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to
optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such
optimization breaks verifier as the return value of 'do_bind()' is marked as any
scalar which violates the requirement of prog return value 0/1.
There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to
e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2)
suggested by Andrii, marking do_bind() with __weak attribute so the compiler
cannot make any assumption on do_bind() return value.
This patch adopted adding __weak approach which is simpler and more resistant
to potential compiler optimizations.
test_align selftest relies on BPF verifier log emitting register states
for specific instructions in expected format. Unfortunately, BPF
verifier precision backtracking log interferes with such expectations.
And instruction on which precision propagation happens sometimes don't
output full expected register states. This does indeed look like
something to be improved in BPF verifier, but is beyond the scope of
this patch set.
So to make test_align a bit more robust, inject few dummy R4 = R5
instructions which capture desired state of R5 and won't have precision
tracking logs on them. This fixes tests until we can improve BPF
verifier output in the presence of precision tracking.
Exploit the property of about-to-be-checkpointed state to be able to
forget all precise markings up to that point even more aggressively. We
now clear all potentially inherited precise markings right before
checkpointing and branching off into child state. If any of children
states require precise knowledge of any SCALAR register, those will be
propagated backwards later on before this state is finalized, preserving
correctness.
There is a single selftests BPF program change, but tremendous one: 25x
reduction in number of verified instructions and states in
trace_virtqueue_add_sgs.
Cilium results are more modest, but happen across wider range of programs.
SELFTESTS RESULTS
=================
$ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results.csv ~/imprecise-aggressive-results.csv | grep -v '+0'
File Program Total insns (A) Total insns (B) Total insns (DIFF) Total states (A) Total states (B) Total states (DIFF)
------------------- ----------------------- --------------- --------------- ------------------ ---------------- ---------------- -------------------
loop6.bpf.linked1.o trace_virtqueue_add_sgs 398057 15114 -382943 (-96.20%) 8717 336 -8381 (-96.15%)
------------------- ----------------------- --------------- --------------- ------------------ ---------------- ---------------- -------------------
Setting reg->precise to true in current state is not necessary from
correctness standpoint, but it does pessimise the whole precision (or
rather "imprecision", because that's what we want to keep as much as
possible) tracking. Why is somewhat subtle and my best attempt to
explain this is recorded in an extensive comment for __mark_chain_precise()
function. Some more careful thinking and code reading is probably required
still to grok this completely, unfortunately. Whiteboarding and a bunch
of extra handwaiving in person would be even more helpful, but is deemed
impractical in Git commit.
Next patch pushes this imprecision property even further, building on top of
the insights described in this patch.
End results are pretty nice, we get reduction in number of total instructions
and states verified due to a better states reuse, as some of the states are now
more generic and permissive due to less unnecessary precise=true requirements.
Slight regression in test_tc_dtime.bpf.linked1.o/ingress_fwdns_prio101
is left for a follow up, there might be some more precision-related bugs
in existing BPF verifier logic.
Stop forcing precise=true for SCALAR registers when BPF program has any
subprograms. Current restriction means that any BPF program, as soon as
it uses subprograms, will end up not getting any of the precision
tracking benefits in reduction of number of verified states.
This patch keeps the fallback mark_all_scalars_precise() behavior if
precise marking has to cross function frames. E.g., if subprogram
requires R1 (first input arg) to be marked precise, ideally we'd need to
backtrack to the parent function and keep marking R1 and its
dependencies as precise. But right now we give up and force all the
SCALARs in any of the current and parent states to be forced to
precise=true. We can lift that restriction in the future.
But this patch fixes two issues identified when trying to enable
precision tracking for subprogs.
First, prevent "escaping" from top-most state in a global subprog. While
with entry-level BPF program we never end up requesting precision for
R1-R5 registers, because R2-R5 are not initialized (and so not readable
in correct BPF program), and R1 is PTR_TO_CTX, not SCALAR, and so is
implicitly precise. With global subprogs, though, it's different, as
global subprog a) can have up to 5 SCALAR input arguments, which might
get marked as precise=true and b) it is validated in isolation from its
main entry BPF program. b) means that we can end up exhausting parent
state chain and still not mark all registers in reg_mask as precise,
which would lead to verifier bug warning.
To handle that, we need to consider two cases. First, if the very first
state is not immediately "checkpointed" (i.e., stored in state lookup
hashtable), it will get correct first_insn_idx and last_insn_idx
instruction set during state checkpointing. As such, this case is
already handled and __mark_chain_precision() already handles that by
just doing nothing when we reach to the very first parent state.
st->parent will be NULL and we'll just stop. Perhaps some extra check
for reg_mask and stack_mask is due here, but this patch doesn't address
that issue.
More problematic second case is when global function's initial state is
immediately checkpointed before we manage to process the very first
instruction. This is happening because when there is a call to global
subprog from the main program the very first subprog's instruction is
marked as pruning point, so before we manage to process first
instruction we have to check and checkpoint state. This patch adds
a special handling for such "empty" state, which is identified by having
st->last_insn_idx set to -1. In such case, we check that we are indeed
validating global subprog, and with some sanity checking we mark input
args as precise if requested.
Note that we also initialize state->first_insn_idx with correct start
insn_idx offset. For main program zero is correct value, but for any
subprog it's quite confusing to not have first_insn_idx set. This
doesn't have any functional impact, but helps with debugging and state
printing. We also explicitly initialize state->last_insns_idx instead of
relying on is_state_visited() to do this with env->prev_insns_idx, which
will be -1 on the very first instruction. This concludes necessary
changes to handle specifically global subprog's precision tracking.
Second identified problem was missed handling of BPF helper functions
that call into subprogs (e.g., bpf_loop and few others). From precision
tracking and backtracking logic's standpoint those are effectively calls
into subprogs and should be called as BPF_PSEUDO_CALL calls.
This patch takes the least intrusive way and just checks against a short
list of current BPF helpers that do call subprogs, encapsulated in
is_callback_calling_function() function. But to prevent accidentally
forgetting to add new BPF helpers to this "list", we also do a sanity
check in __check_func_call, which has to be called for each such special
BPF helper, to validate that BPF helper is indeed recognized as
callback-calling one. This should catch any missed checks in the future.
Adding some special flags to be added in function proto definitions
seemed like an overkill in this case.
With the above changes, it's possible to remove forceful setting of
reg->precise to true in __mark_reg_unknown, which turns on precision
tracking both inside subprogs and entry progs that have subprogs. No
warnings or errors were detected across all the selftests, but also when
validating with veristat against internal Meta BPF objects and Cilium
objects. Further, in some BPF programs there are noticeable reduction in
number of states and instructions validated due to more effective
precision tracking, especially benefiting syncookie test.
Commit 010a0aad39fc ("kallsyms: Correctly sequence symbols when
CONFIG_LTO_CLANG=y") added --lto-clang, and updated the usage()
function, but not the comment. Update it in the same way.
The comment in scripts/kallsyms.c describing the usage of
scripts/kallsyms does not reflect the latest implementation.
Fix the comment to be equivalent to what the usage() function prints.
My randconfig build setup ran into another kallsyms warning:
Inconsistent kallsyms data
Try make KALLSYMS_EXTRA_PASS=1 as a workaround
After adding some debugging code to kallsyms.c, I saw that the recently
added kallsyms_seqs_of_names symbol can sometimes cause the second stage
table to be slightly longer than the first stage, which makes the
build inconsistent.
Add it to the exception table that contains all other kallsyms-generated
symbols.
Fixes: 60443c88f3a8 ("kallsyms: Improve the performance of kallsyms_lookup_name()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the resource is the first in the bulk_move range, adding it again
(thus moving it to the tail) will corrupt the list since the first
pointer is not moved. This eventually lead to null pointer deref in
ttm_lru_bulk_move_del()
Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5") Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> CC: stable@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20230622141902.28718-3-Yunxiang.Li@amd.com Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if
they have referenced variables") added a check to fail histogram creation
if save_hist_vars() failed to add histogram to hist_vars list. But the
commit failed to set ret to failed return code before jumping to
unregister histogram, fix it.
Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfella@purestorage.com Cc: stable@vger.kernel.org Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Tested-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mdio_bus_init() and phy_driver_register() both have error paths, and if
those are ever hit, ethtool will have a stale pointer to the
phy_ethtool_phy_ops stub structure, which references memory from a
module that failed to load (phylib).
It is probably hard to force an error in this code path even manually,
but the error teardown path of phy_init() should be the same as
phy_exit(), which is now simply not the case.
Fixes: 1536e2857bd3 ("tcp: Add a TCP_FASTOPEN socket option to get a max backlog on its listner") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
do_tcp_getsockopt() reads tp->tsoffset while another cpu
might change its value.
Fixes: 93be6ce0e91b ("tcp: set and get per-socket timestamp") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230719212857.3943972-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
KASAN reports that there's a use-after-free in
hci_remove_adv_monitor(). Trawling through the disassembly, you can
see that the complaint is from the access in bt_dev_dbg() under the
HCI_ADV_MONITOR_EXT_MSFT case. The problem case happens because
msft_remove_monitor() can end up freeing the monitor
structure. Specifically:
hci_remove_adv_monitor() ->
msft_remove_monitor() ->
msft_remove_monitor_sync() ->
msft_le_cancel_monitor_advertisement_cb() ->
hci_free_adv_monitor()
Let's fix the problem by just stashing the relevant data when it's
still valid.
Fixes: 7cf5c2978f23 ("Bluetooth: hci_sync: Refactor remove Adv Monitor") Signed-off-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
sk->sk_state indicates whether iso_pi(sk)->conn is valid. Operations
that check/update sk_state and access conn should hold lock_sock,
otherwise they can race.
The order of taking locks is hci_dev_lock > lock_sock > iso_conn_lock,
which is how it is in connect/disconnect_cfm -> iso_conn_del ->
iso_chan_del.
Fix locking in iso_connect_cis/bis and sendmsg/recvmsg to take lock_sock
around updating sk_state and conn.
iso_conn_del must not occur during iso_connect_cis/bis, as it frees the
iso_conn. Hold hdev->lock longer to prevent that.
This should not reintroduce the issue fixed in commit 241f51931c35
("Bluetooth: ISO: Avoid circular locking dependency"), since the we
acquire locks in order. We retain the fix in iso_sock_connect to release
lock_sock before iso_connect_* acquires hdev->lock.
Similarly for commit 6a5ad251b7cd ("Bluetooth: ISO: Fix possible
circular locking dependency"). We retain the fix in iso_conn_ready to
not acquire iso_conn_lock before lock_sock.
iso_conn_add shall return iso_conn with valid hcon. Make it so also when
reusing an old CIS connection waiting for disconnect timeout (see
__iso_sock_close where conn->hcon is set to NULL).
In hci_cs_disconnect, we do hci_conn_del even if disconnection failed.
ISO, L2CAP and SCO connections refer to the hci_conn without
hci_conn_get, so disconn_cfm must be called so they can clean up their
conn, otherwise use-after-free occurs.
Fixes: b8d290525e39 ("Bluetooth: clean up connection in hci_cs_disconnect") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
hci_update_accept_list_sync iterates over hdev->pend_le_conns and
hdev->pend_le_reports, and waits for controller events in the loop body,
without holding hdev lock.
Meanwhile, these lists and the items may be modified e.g. by
le_scan_cleanup. This can invalidate the list cursor or any other item
in the list, resulting to invalid behavior (eg use-after-free).
Use RCU for the hci_conn_params action lists. Since the loop bodies in
hci_sync block and we cannot use RCU or hdev->lock for the whole loop,
copy list items first and then iterate on the copy. Only the flags field
is written from elsewhere, so READ_ONCE/WRITE_ONCE should guarantee we
read valid values.
Free params everywhere with hci_conn_params_free so the cleanup is
guaranteed to be done properly.
This fixes the following, which can be triggered e.g. by BlueZ new
mgmt-tester case "Add + Remove Device Nowait - Success", or by changing
hci_le_set_cig_params to always return false, and running iso-tester:
==================================================================
BUG: KASAN: slab-use-after-free in hci_update_passive_scan_sync (net/bluetooth/hci_sync.c:2536 net/bluetooth/hci_sync.c:2723 net/bluetooth/hci_sync.c:2841)
Read of size 8 at addr ffff888001265018 by task kworker/u3:0/32
Fixes: e8907f76544f ("Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
end key should be equal to start unless NFT_SET_EXT_KEY_END is present.
Its possible to add elements that only have a start key
("{ 1.0.0.0 . 2.0.0.0 }") without an internval end.
Insertion treats this via:
if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
end = (const u8 *)nft_set_ext_key_end(ext)->data;
else
end = start;
but removal side always uses nft_set_ext_key_end().
This is wrong and leads to garbage remaining in the set after removal
next lookup/insert attempt will give:
BUG: KASAN: slab-use-after-free in pipapo_get+0x8eb/0xb90
Read of size 1 at addr ffff888100d50586 by task nft-pipapo_uaf_/1399
Call Trace:
kasan_report+0x105/0x140
pipapo_get+0x8eb/0xb90
nft_pipapo_insert+0x1dc/0x1710
nf_tables_newsetelem+0x31f5/0x4e00
..
Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: lonial con <kongln9170@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
On some platforms there is a padding hole in the nft_verdict
structure, between the verdict code and the chain pointer.
On element insertion, if the new element clashes with an existing one and
NLM_F_EXCL flag isn't set, we want to ignore the -EEXIST error as long as
the data associated with duplicated element is the same as the existing
one. The data equality check uses memcmp.
For normal data (NFT_DATA_VALUE) this works fine, but for NFT_DATA_VERDICT
padding area leads to spurious failure even if the verdict data is the
same.
This then makes the insertion fail with 'already exists' error, even
though the new "key : data" matches an existing entry and userspace
told the kernel that it doesn't want to receive an error indication.
Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Generic fixup for CS35L41 amplifies should not have vendor specific
chained fixup. For ThinkPad laptops with led issue, we can just add
specific fixup.
Fixes: a6ac60b36dade (ALSA: hda/realtek: Fix mute led issue on thinkpad with cs35l41 s-codec) Signed-off-by: Vitaly Rodionov <vitalyr@opensource.cirrus.com> Link: https://lore.kernel.org/r/20230720082022.13033-1-vitalyr@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
This func misses checking for platform_get_irq()'s call and may passes the
negative error codes to request_irq(), which takes unsigned IRQ #,
causing it to fail with -EINVAL, overriding an original error code.
Fix this by stop calling request_irq() with invalid IRQ #s.
Commit 3f4ca5fafc08 ("tcp: avoid the lookup process failing to get sk in
ehash table") reversed the order in how a socket is inserted into ehash
to fix an issue that ehash-lookup could fail when reqsk/full sk/twsk are
swapped. However, it introduced another lookup failure.
The full socket in ehash is allocated from a slab with SLAB_TYPESAFE_BY_RCU
and does not have SOCK_RCU_FREE, so the socket could be reused even while
it is being referenced on another CPU doing RCU lookup.
Let's say a socket is reused and inserted into the same hash bucket during
lookup. After the blamed commit, a new socket is inserted at the end of
the list. If that happens, we will skip sockets placed after the previous
position of the reused socket, resulting in ehash lookup failure.
As described in Documentation/RCU/rculist_nulls.rst, we should insert a
new socket at the head of the list to avoid such an issue.
This issue, the swap-lookup-failure, and another variant reported in [0]
can all be handled properly by adding a locked ehash lookup suggested by
Eric Dumazet [1].
However, this issue could occur for every packet, thus more likely than
the other two races, so let's revert the change for now.
key might contain private part of the key, so better use
kfree_sensitive to free it.
Fixes: 38320c70d282 ("[IPSEC]: Use crypto_aead and authenc in ESP") Signed-off-by: Wang Ming <machel@vivo.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 32078 Comm: syz-executor.4 Not tainted 6.5.0-rc1-syzkaller-00033-geb26cbb1a754 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230717144445.653164-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
When using IPv4/TCP, skb->hash comes from sk->sk_txhash except in
TIME_WAIT and SYN_RECV where it's not set in the reply skb from
ip_send_unicast_reply. Those packets will have a mismatched hash with
others from the same flow as their hashes will be 0. IPv6 does not have
the same issue as the hash is set from the socket txhash in those cases.
This commits sets the hash in the reply skb from ip_send_unicast_reply,
which makes the IPv4 code behaving like IPv6.
Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Stable-dep-of: 5e5265522a9a ("tcp: annotate data-races around tcp_rsk(req)->txhash") Signed-off-by: Sasha Levin <sashal@kernel.org>
In normal operation, each populated queue item has
next_to_watch pointing to the last TX desc of the packet,
while each cleaned item has it set to 0. In particular,
next_to_use that points to the next (necessarily clean)
item to use has next_to_watch set to 0.
When the TX queue is used both by an application using
AF_XDP with ZEROCOPY as well as a second non-XDP application
generating high traffic, the queue pointers can get in
an invalid state where next_to_use points to an item
where next_to_watch is NOT set to 0.
However, the implementation assumes at several places
that this is never the case, so if it does hold,
bad things happen. In particular, within the loop inside
of igc_clean_tx_irq(), next_to_clean can overtake next_to_use.
Finally, this prevents any further transmission via
this queue and it never gets unblocked or signaled.
Secondly, if the queue is in this garbled state,
the inner loop of igc_clean_tx_ring() will never terminate,
completely hogging a CPU core.
The reason is that igc_xdp_xmit_zc() reads next_to_use
before acquiring the lock, and writing it back
(potentially unmodified) later. If it got modified
before locking, the outdated next_to_use is written
pointing to an item that was already used elsewhere
(and thus next_to_watch got written).
Fixes: 9acf59a752d4 ("igc: Enable TX via AF_XDP zero-copy") Signed-off-by: Florian Kauer <florian.kauer@linutronix.de> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230717175444.3217831-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
|NETDEV WATCHDOG: enp3s0 (igc): transmit queue 2 timed out
The reason is the Tx queue transmission start (txq->trans_start) is not updated
in XDP code path. Therefore, add it for all XDP transmission functions.
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: 78adb4bcf99e ("igc: Prevent garbled TX queue with XDP ZEROCOPY") Signed-off-by: Sasha Levin <sashal@kernel.org>
When running an freplace attached bpf program on an arm64 system w were
seeing the following issue:
Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI
After a bit of work to track it down I determined that what appeared to be
happening is that the 'bti c' at the start of the program was somehow being
reached after a 'br' instruction. Further digging pointed me toward the
fact that the function was attached via freplace. This in turn led me to
build_plt which I believe is invoking the long jump which is triggering
this error.
To resolve it we can replace the 'bti c' with 'bti jc' and add a comment
explaining why this has to be modified as such.
While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.
It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.
Consider functions with following stack depths:
main: 256
async: 256
foo: 256
main:
rX = async
bpf_timer_set_callback(...)
async:
foo()
Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.
Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.
The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.
This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.
Current driver enables backpressure for LBK interfaces.
But these interfaces do not support this feature.
Hence, this patch fixes the issue by skipping the
backpressure configuration for these interfaces.
Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool"). Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Link: https://lore.kernel.org/r/20230716093741.28063-1-gakula@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
security/keys/trusted-keys/trusted_tpm2.c:203: warning: expecting prototype for tpm_buf_append_auth(). Prototype was for tpm2_buf_append_auth() instead.
The reset task is currently scheduled from the watchdog or adminq tasks.
First, all direct calls to schedule the reset task are replaced with the
iavf_schedule_reset(), which is modified to accept the flag showing the
type of reset.
To prevent the reset task from starting once iavf_remove() starts, we need
to check the __IAVF_IN_REMOVE_TASK bit before we schedule it. This is now
easily added to iavf_schedule_reset().
Finally, remove the check for IAVF_FLAG_RESET_NEEDED in the watchdog task.
It is redundant since all callers who set the flag immediately schedules
the reset task.
Fixes: 3ccd54ef44eb ("iavf: Fix init state closure on remove") Fixes: 14756b2ae265 ("iavf: Fix __IAVF_RESETTING state usage") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
A driver's lock (crit_lock) is used to serialize all the driver's tasks.
Lockdep, however, shows a circular dependency between rtnl and
crit_lock. This happens when an ndo that already holds the rtnl requests
the driver to reset, since the reset task (in some paths) tries to grab
rtnl to either change real number of queues of update netdev features.
[566.241851] ======================================================
[566.241893] WARNING: possible circular locking dependency detected
[566.241936] 6.2.14-100.fc36.x86_64+debug #1 Tainted: G OE
[566.241984] ------------------------------------------------------
[566.242025] repro.sh/2604 is trying to acquire lock:
[566.242061] ffff9280fc5ceee8 (&adapter->crit_lock){+.+.}-{3:3}, at: iavf_close+0x3c/0x240 [iavf]
[566.242167]
but task is already holding lock:
[566.242209] ffffffff9976d350 (rtnl_mutex){+.+.}-{3:3}, at: iavf_remove+0x6b5/0x730 [iavf]
[566.242300]
which lock already depends on the new lock.
The deadlock can be triggered by a script that is continuously resetting
the VF adapter while doing other operations requiring RTNL, e.g:
while :; do
ip link set $VF up
ethtool --set-channels $VF combined 2
ip link set $VF down
ip link set $VF up
ethtool --set-channels $VF combined 4
ip link set $VF down
done
Any operation that triggers a reset can substitute "ethtool --set-channles"
As a fix, add a new task "finish_config" that do all the work which
needs rtnl lock. With the exception of iavf_remove(), all work that
require rtnl should be called from this task.
As for iavf_remove(), at the point where we need to call
unregister_netdevice() (and grab rtnl_lock), we make sure the finish_config
task is not running (cancel_work_sync()) to safely grab rtnl. Subsequent
finish_config work cannot restart after that since the task is guarded
by the __IAVF_IN_REMOVE_TASK bit in iavf_schedule_finish_config().
Fixes: 5ac49f3c2702 ("iavf: use mutexes for locking of critical sections") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
There was a fail when trying to add the interface to bonding
right after changing the MTU on the interface. It was caused
by bonding interface unable to open the interface due to
interface being in __RESETTING state because of MTU change.
Add new reset_waitqueue to indicate that reset has finished.
Add waiting for reset to finish in callbacks which trigger hw reset:
iavf_set_priv_flags(), iavf_change_mtu() and iavf_set_ringparam().
We use a 5000ms timeout period because on Hyper-V based systems,
this operation takes around 3000-4000ms. In normal circumstances,
it doesn't take more than 500ms to complete.
Add a function iavf_wait_for_reset() to reuse waiting for reset code and
use it also in iavf_set_channels(), which already waits for reset.
We don't use error handling in iavf_set_channels() as this could
cause the device to be in incorrect state if the reset was scheduled
but hit timeout or the waitng function was interrupted by a signal.
Fixes: 4e5e6b5d9d13 ("iavf: Fix return of set the new channel count") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Co-developed-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Dawid Wesierski <dawidx.wesierski@intel.com> Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com> Signed-off-by: Kamil Maziarz <kamil.maziarz@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When the user disables rxvlan offloading and then changes the number of
channels, all VLAN ports are unable to receive traffic.
Changing the number of channels triggers a VFR reset. During re-init, when
VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS is received, we do:
1 - set the IAVF_FLAG_SETUP_NETDEV_FEATURES flag
2 - call
iavf_set_vlan_offload_features(adapter, 0, netdev->features);
The second step sends to the PF the __default__ features, in this case
aq_required |= IAVF_FLAG_AQ_ENABLE_CTAG_VLAN_STRIPPING
While the first step forces the watchdog task to call
netdev_update_features() -> iavf_set_features() ->
iavf_set_vlan_offload_features(adapter, netdev->features, features).
Since the user disabled the "rxvlan", this sets:
aq_required |= IAVF_FLAG_AQ_DISABLE_CTAG_VLAN_STRIPPING
When we start processing the AQ commands, both flags are enabled. Since we
process DISABLE_XTAG first then ENABLE_XTAG, this results in the PF
enabling the rxvlan offload. This breaks all communications on the VLAN
net devices.
Fix by removing the call to iavf_set_vlan_offload_features() (second
step). Calling netdev_update_features() from watchdog task is enough for
both init and reset paths.
Fixes: 7598f4b40bd6 ("iavf: Move netdev_update_features() into watchdog task") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger it") Signed-off-by: Sasha Levin <sashal@kernel.org>
Remove netdev_update_features() from iavf_adminq_task(), as it can cause
deadlocks due to needing rtnl_lock. Instead use the
IAVF_FLAG_SETUP_NETDEV_FEATURES flag to indicate that netdev features need
to be updated in the watchdog task. iavf_set_vlan_offload_features()
and iavf_set_queue_vlan_tag_loc() can be called directly from
iavf_virtchnl_completion().
Suggested-by: Phani Burra <phani.r.burra@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Stable-dep-of: c2ed2403f12c ("iavf: Wait for reset in callbacks which trigger it") Signed-off-by: Sasha Levin <sashal@kernel.org>
If the system tries to close the netdev while iavf_reset_task() is
running, __LINK_STATE_START will be cleared and netif_running() will
return false in iavf_reinit_interrupt_scheme(). This will result in
iavf_free_traffic_irqs() not being called and a leak as follows:
[7632.489326] remove_proc_entry: removing non-empty directory 'irq/999', leaking at least 'iavf-enp24s0f0v0-TxRx-0'
[7632.490214] WARNING: CPU: 0 PID: 10 at fs/proc/generic.c:718 remove_proc_entry+0x19b/0x1b0
is shown when pci_disable_msix() is later called. Fix by using the
internal adapter state. The traffic IRQs will always exist if
state == __IAVF_RUNNING.
Fixes: 5b36e8d04b44 ("i40evf: Enable VF to request an alternate queue allocation") Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
If we set channels greater during iavf_remove(), and waiting reset done
would be timeout, then returned with error but changed num_active_queues
directly, that will lead to OOB like the following logs. Because the
num_active_queues is greater than tx/rx_rings[] allocated actually.