It's a bug in the concurrent scenario of unregister_netdevice_many()
and raw_release() as following:
cpu0 cpu1
unregister_netdevice_many(can_dev)
unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this
net_set_todo(can_dev)
raw_release(can_socket)
dev = dev_get_by_index(, ro->ifindex); // dev == NULL
if (dev) { // receivers in dev_rcv_lists not free because dev is NULL
raw_disable_allfilters(, dev, );
dev_put(dev);
}
...
ro->bound = 0;
...
call_netdevice_notifiers(NETDEV_UNREGISTER, )
raw_notify(, NETDEV_UNREGISTER, )
if (ro->bound) // invalid because ro->bound has been set 0
raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed
Add a net_device pointer member in struct raw_sock to record bound
can_dev, and use rtnl_lock to serialize raw_socket members between
raw_bind(), raw_release(), raw_setsockopt() and raw_notify(). Use
ro->dev to decide whether to free receivers in dev_rcv_lists.
Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier") Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/all/20230711011737.1969582-1-william.xuanziyang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
journal_clean_one_cp_list() and journal_shrink_one_cp_list() are almost
the same, so merge them into journal_shrink_one_cp_list(), remove the
nr_to_scan parameter, always scan and try to free the whole checkpoint
list.
cpu_has_octeon_cache was tied to 0 for generic cpu-features,
whith this generic kernel built for octeon CPU won't boot.
Just enable this flag by cpu_type. It won't hurt orther platforms
because compiler will eliminate the code path on other processors.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Stable-dep-of: 5487a7b60695 ("MIPS: cpu-features: Use boot_cpu_type for CPU type based features") Signed-off-by: Sasha Levin <sashal@kernel.org>
wait till linux guest boots, then hotplug device:
(qemu) device_add qxl,bus=rp1
hotplug on guest side fails with:
pci 0000:01:00.0: [1b36:0100] type 00 class 0x038000
pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x03ffffff]
pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x03ffffff]
pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x00001fff]
pci 0000:01:00.0: reg 0x1c: [io 0x0000-0x001f]
pci 0000:01:00.0: BAR 0: no space for [mem size 0x04000000]
pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x04000000]
pci 0000:01:00.0: BAR 1: no space for [mem size 0x04000000]
pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x04000000]
pci 0000:01:00.0: BAR 2: assigned [mem 0xfe800000-0xfe801fff]
pci 0000:01:00.0: BAR 3: assigned [io 0x1000-0x101f]
qxl 0000:01:00.0: enabling device (0000 -> 0003)
Unable to create vram_mapping
qxl: probe of 0000:01:00.0 failed with error -12
However when using native PCIe hotplug
'-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off'
it works fine, since kernel attempts to reassign unused resources.
Use the same machinery as native PCIe hotplug to (re)assign resources.
Link: https://lore.kernel.org/r/20230424191557.2464760-1-imammedo@redhat.com Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
- It's really the only one where this matters. I tried looking around,
and I didn't find any non-pci vga-compatible controllers for x86
(since that's the only platform where we had this until a few
patches ago), where a driver participating in the aperture claim
dance would interfere.
- I also don't expect that any future bus anytime soon will
not just look like pci towards the OS, that's been the case for like
25+ years by now for practically everything (even non non-x86).
- Also it's a bit funny if we have one part of the vga removal in the
pci function, and the other in the generic one.
v2: Rebase.
v4:
- fix Daniel's S-o-b address
v5:
- add back an S-o-b tag with Daniel's Intel address
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-6-tzimmermann@suse.de
Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
Otherwise it's a bit silly, and we might throw out the driver for the
screen the user is actually looking at. I haven't found a bug report
for this case yet, but we did get bug reports for the analog case
where we're throwing out the efifb driver.
v2: Flip the check around to make it clear it's a special case for
kicking out the vgacon driver only (Thomas)
Only really pci devices have a business setting this - it's for
figuring out whether the legacy vga stuff should be nuked too. And
with the preceding two patches those are all using the pci version of
this.
Which means for all other callers primary == false and we can remove
it now.
v2:
- Reorder to avoid compile fail (Thomas)
- Include gma500, which retained it's called to the non-pci version.
v4:
- fix Daniel's S-o-b address
v5:
- add back an S-o-b tag with Daniel's Intel address
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Deepak Rawat <drawat.floss@gmail.com> Cc: Neil Armstrong <neil.armstrong@linaro.org> Cc: Kevin Hilman <khilman@baylibre.com> Cc: Jerome Brunet <jbrunet@baylibre.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Emma Anholt <emma@anholt.net> Cc: Helge Deller <deller@gmx.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: linux-hyperv@vger.kernel.org Cc: linux-amlogic@lists.infradead.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-tegra@vger.kernel.org Cc: linux-fbdev@vger.kernel.org Acked-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-4-tzimmermann@suse.de
Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
This one nukes all framebuffers, which is a bit much. In reality
gma500 is igpu and never shipped with anything discrete, so there should
not be any difference.
v2: Unfortunately the framebuffer sits outside of the pci bars for
gma500, and so only using the pci helpers won't be enough. Otoh if we
only use non-pci helper, then we don't get the vga handling, and
subsequent refactoring to untangle these special cases won't work.
It's not pretty, but the simplest fix (since gma500 really is the only
quirky pci driver like this we have) is to just have both calls.
v4:
- fix Daniel's S-o-b address
v5:
- add back an S-o-b tag with Daniel's Intel address
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230406132109.32050-2-tzimmermann@suse.de
Stable-dep-of: 5ae3716cfdcd ("video/aperture: Only remove sysfb on the default vga pci device") Signed-off-by: Sasha Levin <sashal@kernel.org>
On server-initiated disconnect, rpcrdma_xprt_disconnect() was DMA-
unmapping the Receive buffers, but rpcrdma_post_recvs() neglected
to remap them after a new connection had been established. The
result was immediate failure of the new connection with the Receives
flushing with LOCAL_PROT_ERR.
Fixes: 671c450b6fe0 ("xprtrdma: Fix oops in Receive handler after device removal") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Another highly rare error case when a page allocating loop (inside
__nfs4_get_acl_uncached, this time) is not properly unwound on error.
Since pages array is allocated being uninitialized, need to free only
lower array indices. NULL checks were useful before commit 62a1573fcf84
("NFSv4 fix acl retrieval over krb5i/krb5p mounts") when the array had
been initialized to zero on stack.
Found by Linux Verification Center (linuxtesting.org).
There is a slight issue with error handling code inside
nfs42_proc_getxattr(). If page allocating loop fails then we free the
failing page array element which is NULL but __free_page() can't deal with
NULL args.
Found by Linux Verification Center (linuxtesting.org).
Something is currently broken in the f2fs code, Guenter has reported
boot problems with it for a few releases now, so revert the most recent
f2fs changes in the hope to get this back to a working filesystem.
Something is currently broken in the f2fs code, Guenter has reported
boot problems with it for a few releases now, so revert the most recent
f2fs changes in the hope to get this back to a working filesystem.
Something is currently broken in the f2fs code, Guenter has reported
boot problems with it for a few releases now, so revert the most recent
f2fs changes in the hope to get this back to a working filesystem.
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Macro TEXT_TEXT references TEXT_MAIN which normally expands to only
".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes
".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk
sections. The output layout is then different than expected. For
instance, the currently defined range [__indirect_thunk_start,
__indirect_thunk_end] becomes empty.
Prevent the problem by using ".." as the first separator, for example,
".text..__x86.indirect_thunk". This pattern is utilized by other
explicit section names which start with one of the standard prefixes,
such as ".text" or ".data", and that need to be individually selected in
the linker script.
[ nathan: Fix conflicts with SRSO and fold in fix issue brought up by
Andrew Cooper in post-review:
https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ]
Fixes: dc5723b02e52 ("kbuild: add support for Clang LTO") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Skip the srso cmd line parsing which is not needed on Zen1/2 with SMT
disabled and with the proper microcode applied (latter should be the
case anyway) as those are not affected.
Fixes: 5a15d8348881 ("x86/srso: Tie SBPB bit setting to microcode patch detection") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20230813104517.3346-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Initially, it was thought that doing an innocuous division in the #DE
handler would take care to prevent any leaking of old data from the
divider but by the time the fault is raised, the speculation has already
advanced too far and such data could already have been used by younger
operations.
Therefore, do the innocuous division on every exit to userspace so that
userspace doesn't see any potentially old data from integer divisions in
kernel space.
Do the same before VMRUN too, to protect host data from leaking into the
guest too.
Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Use LEA instead of ADD when adjusting %rsp in srso_safe_ret{,_alias}()
so as to avoid clobbering flags. Drop one of the INT3 instructions to
account for the LEA consuming one more byte than the ADD.
KVM's emulator makes indirect calls into a jump table of sorts, where
the destination of each call is a small blob of code that performs fast
emulation by executing the target instruction with fixed operands.
E.g. to emulate ADC, fastop() invokes adcb_al_dl():
A major motivation for doing fast emulation is to leverage the CPU to
handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
both an input and output to the target of the call. fastop() collects
the RFLAGS result by pushing RFLAGS onto the stack and popping them back
into a variable (held in %rdi in this case):
asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
Christian reported spurious module load crashes after some of Song's
module memory layout patches.
Turns out that if the very last instruction on the very last page of the
module is a 'JMP __x86_return_thunk' then __static_call_fixup() will
trip a fault and die.
And while the module rework made this slightly more likely to happen,
it's always been possible.
Fixes: ee88d363d156 ("x86,static_call: Use alternative RET encoding") Reported-by: Christian Bricart <christian@bricart.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lkml.kernel.org/r/20230816104419.GA982867@hirez.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Since there can only be one active return_thunk, there only needs be
one (matching) untrain_ret. It fundamentally doesn't make sense to
allow multiple untrain_ret at the same time.
Fold all the 3 different untrain methods into a single (temporary)
helper stub.
srso_safe_ret: // embedded in movabs instruction
add $8,%rsp
ret
int3
srso_return_thunk:
call srso_safe_ret
ud2
While retbleed does:
zen_untrain_ret:
test $0xcc, %bl
lfence
jmp zen_return_thunk
int3
zen_return_thunk: // embedded in the test instruction
ret
int3
Where Zen1/2 flush the BTB entry using the instruction decoder trick
(test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
(srso_safe_ret()) which forces the function return instruction to
speculate into a trap (UD2). This RET will then mispredict and
execution will continue at the return site read from the top of the
stack.
Pick one of three options at boot (evey function can only ever return
once).
[ bp: Fixup commit message uarch details and add them in a comment in
the code too. Add a comment about the srso_select_mitigation()
dependency on retbleed_select_mitigation(). Add moar ifdeffery for
32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
32-bit alternatives needing the symbol. ]
There is infrastructure to rewrite return thunks to point to any
random thunk one desires, unwrap that from CALL_THUNKS, which up to
now was the sole user of that.
[ bp: Make the thunks visible on 32-bit and add ifdeffery for the
32-bit builds. ]
vmlinux.o: warning: objtool: srso_untrain_ret() falls through to next function __x86_return_skl()
vmlinux.o: warning: objtool: __x86_return_thunk() falls through to next function __x86_return_skl()
This is because these functions (can) end with CALL, which objtool
does not consider a terminating instruction. Therefore, replace the
INT3 instruction (which is a non-fatal trap) with UD2 (which is a
fatal-trap).
This indicates execution will not continue past this point.
In the real workload, I encountered an issue which could cause the RTO
timer to retransmit the skb per 1ms with linear option enabled. The amount
of lost-retransmitted skbs can go up to 1000+ instantly.
The root cause is that if the icsk_rto happens to be zero in the 6th round
(which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero
due to the changed calculation method in tcp_retransmit_timer() as follows:
Above line could be converted to
icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0
Therefore, the timer expires so quickly without any doubt.
I read through the RFC 6298 and found that the RTO value can be rounded
up to a certain value, in Linux, say TCP_RTO_MIN as default, which is
regarded as the lower bound in this patch as suggested by Eric.
Fixes: 36e31b0af587 ("net: TCP thin linear timeouts") Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We can't simply free the connector after calling drm_connector_init on it.
We need to clean up the drm side first.
It might not fix all regressions from commit 2b5d1c29f6c4
("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts"),
but at least it fixes a memory corruption in error handling related to
that commit.
af_unix: Fix null-ptr-deref in unix_stream_sendpage().
Bing-Jhong Billy Jheng reported null-ptr-deref in unix_stream_sendpage()
with detailed analysis and a nice repro.
unix_stream_sendpage() tries to add data to the last skb in the peer's
recv queue without locking the queue.
If the peer's FD is passed to another socket and the socket's FD is
passed to the peer, there is a loop between them. If we close both
sockets without receiving FD, the sockets will be cleaned up by garbage
collection.
The garbage collection iterates such sockets and unlinks skb with
FD from the socket's receive queue under the queue's lock.
So, there is a race where unix_stream_sendpage() could access an skb
locklessly that is being released by garbage collection, resulting in
use-after-free.
To avoid the issue, unix_stream_sendpage() must lock the peer's recv
queue.
Note the issue does not exist in 6.5+ thanks to the recent sendpage()
refactoring.
This patch is originally written by Linus Torvalds.
This can clean up all irq warnings because of unbalanced
amdgpu_irq_get/put when unplugging/unbinding device, and leave
irq count decrease in each ip fini function.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we use NT_ARM_SSVE to either enable streaming mode or change the
vector length for a process we do not currently do anything to ensure that
there is storage allocated for the SME specific register state. If the
task had not previously used SME or we changed the vector length then
the task will not have had TIF_SME set or backing storage for ZA/ZT
allocated, resulting in inconsistent register sizes when saving state
and spurious traps which flush the newly set register state.
We should set TIF_SME to disable traps and ensure that storage is
allocated for ZA and ZT if it is not already allocated. This requires
modifying sme_alloc() to make the flush of any existing register state
optional so we don't disturb existing state for ZA and ZT.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers") Reported-by: David Spickett <David.Spickett@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> # 5.19.x Link: https://lore.kernel.org/r/20230810-arm64-fix-ptrace-race-v1-1-a5361fad2bd6@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In SCTP protocol, it is using the same timer (T2 timer) for SHUTDOWN and
SHUTDOWN_ACK retransmission. However in sctp conntrack the default timeout
value for SCTP_CONNTRACK_SHUTDOWN_ACK_SENT state is 3 secs while it's 300
msecs for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV state.
As Paolo Valerio noticed, this might cause unwanted expiration of the ct
entry. In my test, with 1s tc netem delay set on the NAT path, after the
SHUTDOWN is sent, the sctp ct entry enters SCTP_CONNTRACK_SHUTDOWN_SEND
state. However, due to 300ms (too short) delay, when the SHUTDOWN_ACK is
sent back from the peer, the sctp ct entry has expired and been deleted,
and then the SHUTDOWN_ACK has to be dropped.
Also, it is confusing these two sysctl options always show 0 due to all
timeout values using sec as unit:
This patch fixes it by also using 3 secs for sctp shutdown send and recv
state in sctp conntrack, which is also RTO.initial value in SCTP protocol.
Note that the very short time value for SCTP_CONNTRACK_SHUTDOWN_SEND/RECV
was probably used for a rare scenario where SHUTDOWN is sent on 1st path
but SHUTDOWN_ACK is replied on 2nd path, then a new connection started
immediately on 1st path. So this patch also moves from SHUTDOWN_SEND/RECV
to CLOSE when receiving INIT in the ORIGINAL direction.
Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") Reported-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Patch series "Fix hugetlb free path race with memory errors".
In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
HWPOISON hugepages" the race window was discovered.
https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
Freeing a hugetlb page back to low level memory allocators is performed
in two steps.
1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
2) Outside lock, allocate vmemmap if necessary and call low level free
Between these two steps, the hugetlb page will appear as a normal
compound page. However, vmemmap for tail pages could be missing.
If a memory error occurs at this time, we could try to update page
flags non-existant page structs.
A much more detailed description is in the first patch.
The first patch addresses the race window. However, it adds a
hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
free operation. This could lead to slowdowns if one is freeing a large
number of hugetlb pages.
The second path optimizes the update_and_free_pages_bulk routine to only
take the lock once in bulk operations.
The second patch is technically not a bug fix, but includes a Fixes tag
and Cc stable to avoid a performance regression. It can be combined with
the first, but was done separately make reviewing easier.
This patch (of 2):
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: James Houghton <jthoughton@google.com> Cc: Jiaqi Yan <jiaqiyan@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why and How]
Current implementation requires FPGA builds to take a different
code path from DCN32 to write to OTG_PIXEL_RATE_DIV. Now that
we have a workaround to write to OTG_PIXEL_RATE_DIV register without
blanking display on hotplug on DCN32, we can allow the code paths for
FPGA to be exactly the same allowing for more consistent
testing.
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Saaem Rizvi <SyedSaaem.Rizvi@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Limonciello, Mario" <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove the capacity inversion detection which is now handled by
util_fits_cpu() returning -1 when we need to continue to look for a
potential CPU with better performance.
This ends up almost reverting patches below except for some comments:
commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection")
commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()")
commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
By taking into account uclamp_min, the 1:1 relation between task misfit
and cpu overutilized is no more true as a task with a small util_avg may
not fit a high capacity cpu because of uclamp_min constraint.
Add a new state in util_fits_cpu() to reflect the case that task would fit
a CPU except for the uclamp_min hint which is a performance requirement.
Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
can use this new value to take additional action to select the best CPU
that doesn't match uclamp_min hint.
When util_fits_cpu() returns -1, we will continue to look for a possible
CPU with better performance, which replaces Capacity Inversion detection
with capacity_orig_of() - thermal_load_avg to detect a capacity inversion.
Pool compaction is currently (basically) single-threaded as
it is performed under pool->lock. Having multiple compaction
threads results in unnecessary contention, as each thread
competes for pool->lock. This, in turn, affects all zsmalloc
operations such as zs_malloc(), zs_map_object(), zs_free(), etc.
Introduce the pool->compaction_in_progress atomic variable,
which ensures that only one compaction context can run at a
time. This reduces overall pool->lock contention in (corner)
cases when many contexts attempt to shrink zspool simultaneously.
Link: https://lkml.kernel.org/r/20230418074639.1903197-1-senozhatsky@chromium.org Fixes: c0547d0b6a4b ("zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The vangogh driver just gained a link time dependency that now causes
randconfig builds to fail:
x86_64-linux-ld: sound/soc/amd/vangogh/pci-acp5x.o: in function `snd_acp5x_probe':
pci-acp5x.c:(.text+0xbb): undefined reference to `snd_amd_acp_find_config'
Fixes: e89f45edb747e ("ASoC: amd: vangogh: Add check for acp config flags in vangogh platform") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230602124447.863476-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0
[ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry. This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.
To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.
commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.
This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code. Remove that dead code.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3f9ffce5765d ("drm/i915: Do panel VBT init early if the VBT
declares an explicit panel type") started using -1 as the value for
unset panel_type. It gets initialized in intel_panel_init_alloc(), but
the SDVO code never calls it.
Call intel_panel_init_alloc() to initialize the panel, including the
panel_type.
Reported-by: Tomi Leppänen <tomi@tomin.site> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8896 Fixes: 3f9ffce5765d ("drm/i915: Do panel VBT init early if the VBT declares an explicit panel type") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Uma Shankar <uma.shankar@intel.com> Tested-by: Tomi Leppänen <tomi@tomin.site> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230803122706.838721-1-jani.nikula@intel.com
(cherry picked from commit 26e60294e8eacedc8ebb33405b2c375fd80e0900) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
qxl_mode_dumb_create() dereferences the qobj returned by
qxl_gem_object_create_with_handle(), but the handle is the only one
holding a reference to it.
A potential attacker could guess the returned handle value and closes it
between the return of qxl_gem_object_create_with_handle() and the qobj
usage, triggering a use-after-free scenario.
Reproducer:
int dri_fd =-1;
struct drm_mode_create_dumb arg = {0};
==================================================================
BUG: KASAN: slab-use-after-free in qxl_mode_dumb_create+0x3c2/0x400 linux/drivers/gpu/drm/qxl/qxl_dumb.c:69
Write of size 1 at addr ffff88801136c240 by task poc/515
The buggy address belongs to the object at ffff88801136c000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 576 bytes inside of
freed 1024-byte region [ffff88801136c000, ffff88801136c400)
Instead of returning a weak reference to the qxl_bo object, return the
created drm_gem_object and let the caller decrement the reference count
when it no longer needs it. As a convenience, if the caller is not
interested in the gobj object, it can pass NULL to the parameter and the
reference counting is descremented internally.
The bug and the reproducer were originally found by the Zero Day Initiative project (ZDI-CAN-20940).
For a completed request, after the mmc_blk_mq_complete_rq(mq, req)
function is executed, the bitmap_tags corresponding to the
request will be cleared, that is, the request will be regarded as
idle. If the request is acquired by a different type of process at
this time, the issue_type of the request may change. It further
caused the value of mq->in_flight[issue_type] to be abnormal,
and a large number of requests could not be sent.
blk_crypto_profile_init() calls lockdep_register_key(), which warns and
does not register if the provided memory is a static object.
blk-crypto-fallback currently has a static blk_crypto_profile and calls
blk_crypto_profile_init() thereupon, resulting in the warning and
failure to register.
Fortunately it is simple enough to use a dynamically allocated profile
and make lockdep function correctly.
Fixes: 2fb48d88e77f ("blk-crypto: use dynamic lock class for blk_crypto_profile::lock") Cc: stable@vger.kernel.org Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Reviewed-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230817141615.15387-1-sweettea-kernel@dorminy.me Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixes an issue affecting the Wifi/Bluetooth connectivity on
ROCK Pi 4 boards. Commit f471b1b2db08 ("arm64: dts: rockchip: Fix Bluetooth
on ROCK Pi 4 boards") introduced a problem with the clock configuration.
Specifically, the clock-names property of the sdio-pwrseq node was not
updated to 'lpo', causing the driver to wait indefinitely for the wrong clock
signal 'ext_clock' instead of the expected one 'lpo'. This prevented the proper
initialization of Wifi/Bluetooth chip on ROCK Pi 4 boards.
To address this, this patch updates the clock-names property of the
sdio-pwrseq node to "lpo" to align with the changes made to the bluetooth node.
This patch has been tested on ROCK Pi 4B.
Fixes: f471b1b2db08 ("arm64: dts: rockchip: Fix Bluetooth on ROCK Pi 4 boards") Cc: stable@vger.kernel.org Signed-off-by: Yogesh Hegde <yogi.kernel@gmail.com> Link: https://lore.kernel.org/r/ZLbATQRjOl09aLAp@zephyrusG14 Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data
for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and
VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands.
According to the VirtIO standard, "Field reserved MUST contain zeroes.
It is defined to make the structure to match the layout of
virtio_net_rss_config structure, defined in 5.1.6.5.7.".
Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq`
field in struct virtio_net_ctrl_rss, which corresponds to the
`reserved` field in struct virtio_net_hash_config, is not zeroed,
thereby violating the VirtIO standard.
This patch solves this problem by zeroing this field in
virtnet_init_default_rss().
Under the current code, when cifs_readpage_worker is called, the call
contract is that the callee should unlock the page. This is documented
in the read_folio section of Documentation/filesystems/vfs.rst as:
> The filesystem should unlock the folio once the read has completed,
> whether it was successful or not.
Without this change, when fscache is in use and cache hit occurs during
a read, the page lock is leaked, producing the following stack on
subsequent reads (via mmap) to the page:
This requires a reboot to resolve; it is a deadlock.
Note however that the call to cifs_readpage_from_fscache does mark the
page clean, but does not free the folio lock. This happens in
__cifs_readpage_from_fscache on success. Releasing the lock at that
point however is not appropriate as cifs_readahead also calls
cifs_readpage_from_fscache and *does* unconditionally release the lock
after its return. This change therefore effectively makes
cifs_readpage_worker work like cifs_readahead.
Signed-off-by: Russell Harmon <russ@har.mn> Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Unloading a hardware specific 8250 driver can produce error "Unable to
handle kernel paging request at virtual address" about ten seconds after
unloading the driver. This happens on uart_hangup() calling
uart_change_pm().
Turns out commit 04e82793f068 ("serial: 8250: Reinit port->pm on port
specific driver unbind") was only a partial fix. If the hardware specific
driver has initialized port->pm function, we need to clear port->pm too.
Just reinitializing port->ops does not do this. Otherwise serial8250_pm()
will call port->pm() instead of serial8250_do_pm().
Fixes: 04e82793f068 ("serial: 8250: Reinit port->pm on port specific driver unbind") Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230804131553.52927-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It was reported that the riscv kernel hangs while executing the test
in [1].
Indeed, the test hangs when trying to write a buffer to a file. The
problem is that the riscv implementation of raw_copy_from_user() does not
return the correct number of bytes not written when an exception happens
and is fixed up, instead it always returns the initial size to copy,
even if some bytes were actually copied.
generic_perform_write() pre-faults the user pages and bails out if nothing
can be written, otherwise it will access the userspace buffer: here the
riscv implementation keeps returning it was not able to copy any byte
though the pre-faulting indicates otherwise. So generic_perform_write()
keeps retrying to access the user memory and ends up in an infinite
loop.
Note that before the commit mentioned in [1] that introduced this
regression, it worked because generic_perform_write() would bail out if
only one byte could not be written.
So fix this by returning the number of bytes effectively not written in
__asm_copy_[to|from]_user() and __clear_user(), as it is expected.
Set spec->en_3kpull_low default to true.
Then fillback ALC236 and ALC257 to false.
Additional note: this addresses a regression caused by the previous
fix 69ea4c9d02b7 ("ALSA: hda/realtek - remove 3k pull low procedure").
The previous workaround was applied too widely without necessity,
which resulted in the pop noise at PM again. This patch corrects the
condition and restores the old behavior for the devices that don't
suffer from the original problem.
The existing use of match_string() caused it to reject 'echo foo' due
to the implicitly appended newline, which was somewhat ergonomically
awkward and inconsistent with typical sysfs behavior. Using the
__sysfs_* variant instead provides more convenient and consistent
linefeed-agnostic behavior.
When the tdm lane mask is computed, the driver currently fills the 1st lane
before moving on to the next. If the stream has less channels than the
lanes can accommodate, slots will be disabled on the last lanes.
Unfortunately, the HW distribute channels in a different way. It distribute
channels in pair on each lanes before moving on the next slots.
This difference leads to problems if a device has an interface with more
than 1 lane and with more than 2 slots per lane.
For example: a playback interface with 2 lanes and 4 slots each (total 8
slots - zero based numbering)
- Playing a 8ch stream:
- All slots activated by the driver
- channel #2 will be played on lane #1 - slot #0 following HW placement
- Playing a 4ch stream:
- Lane #1 disabled by the driver
- channel #2 will be played on lane #0 - slot #2
This behaviour is obviously not desirable.
Change the way slots are activated on the TDM lanes to follow what the HW
does and make sure each channel always get mapped to the same slot/lane.
Fixes: 1a11d88f499c ("ASoC: meson: add tdm formatter base driver") Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20230809171931.1244502-1-jbrunet@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Although the memory map of i.MX93 reference manual rev. 2 claims that
analog top has start address of 0x44480000 and end address of 0x4448ffff,
this overlaps with TMU memory area starting at 0x44482000, as stated in
section 73.6.1.
As PLL configuration registers start at addresses up to 0x44481400, as used
by clk-imx93, reduce the anatop size to 0x2000, so exclude the TMU area
but keep all PLL registers inside.
Fixes: ec8b5b5058ea ("arm64: dts: freescale: Add i.MX93 dtsi support") Signed-off-by: Alexander Stein <alexander.stein@ew.tq-group.com> Reviewed-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Jacky Bai <ping.bai@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The CSI1 PHY reference clock is limited to 125 MHz according to:
i.MX 8M Mini Applications Processor Reference Manual, Rev. 3, 11/2020
Table 5-1. Clock Root Table (continued) / page 307
Slice Index n = 123 .
Currently the IMX8MM_CLK_CSI1_PHY_REF clock is configured to be
fed directly from 1 GHz PLL2 , which overclocks them. Instead, drop
the configuration altogether, which defaults the clock to 24 MHz REF
clock input, which for the PHY reference clock is just fine.
Based on a patch from Marek Vasut for the imx8mn.
Fixes: e523b7c54c05 ("arm64: dts: imx8mm: Add CSI nodes") Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Marek Vasut <marex@denx.de> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Reviewed-by: Adam Ford <aford173@gmail.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The node names should be generic and DT schema expects certain pattern:
imx50-kobo-aura.dtb: gpio-leds: 'on' does not match any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+'
imx6dl-yapp4-draco.dtb: led-controller@30: 'chan@0', 'chan@1', 'chan@2' do not match any of the regexes: '^led@[0-8]$', '^multi-led@[0-8]$', 'pinctrl-[0-9]+'
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
While we are here, set the eMMC maximum clock frequency to 1.5MHz to
follow the ROCK 4C+.
Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support") Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> Tested-By: Folker Schwesinger <dev@folker-schwesinger.de> Link: https://lore.kernel.org/r/20230705144255.115299-2-chris.obbard@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
The commit 3a786086c6f8 ("arm64: dts: qcom: Add missing "-thermal"
suffix for thermal zones") renamed the thermal zone in the pm8150l.dtsi
file to comply with the schema. However this resulted in a clash with
the RB5 board file, which already contained the pm8150l-thermal zone for
the on-board sensor. This resulted in the board file definition
overriding the thermal zone defined in the PMIC include file (and thus
the on-die PMIC temp alarm was not probing at all).
Rename the thermal zone in qcom/qrb5165-rb5.dts to remove this override.
The am335x devices started producing boot errors for resetting musb module
in because of subtle timing changes:
Unhandled fault: external abort on non-linefetch (0x1008)
...
sysc_poll_reset_sysconfig from sysc_reset+0x109/0x12
sysc_reset from sysc_probe+0xa99/0xeb0
...
The fix is to flush posted write after enable before reset during
probe. Note that some devices also need to specify the delay after enable
with ti,sysc-delay-us, but this is not needed for musb on am335x based on
my tests.
Reported-by: kernelci.org bot <bot@kernelci.org> Closes: https://storage.kernelci.org/next/master/next-20230614/arm/multi_v7_defconfig+CONFIG_THUMB2_KERNEL=y/gcc-10/lab-cip/baseline-beaglebone-black.html Fixes: 596e7955692b ("bus: ti-sysc: Add support for software reset") Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 3953c46c3ac7 ("sk_buff: allow segmenting based on frag sizes") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Xin Long <lucien.xin@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20230816142158.1779798-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
So the conditions of leaving global pressure are inconstant, which
may lead to the situation that one pressured net-memcg prevents the
global pressure from being cleared when there is indeed no global
pressure, thus the global constrains are still in effect unexpectedly
on the other sockets.
This patch fixes this by ignoring the net-memcg's pressure when
deciding whether should leave global memory pressure.
In efx_init_tc(), move the setting of efx->tc->up after the
flow_indr_dev_register() call, so that if it fails, efx_fini_tc()
won't call flow_indr_dev_unregister().
Fixes: 5b2e12d51bd8 ("sfc: bind indirect blocks for TC offload on EF100") Suggested-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/a81284d7013aba74005277bd81104e4cfbea3f6f.1692114888.git.ecree.xilinx@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
If the switch is reset during active EEPROM transactions, as in
just after an SoC reset after power up, the I2C bus transaction
may be cut short leaving the EEPROM internal I2C state machine
in the wrong state. When the switch is reset again, the bad
state machine state may result in data being read from the wrong
memory location causing the switch to enter unexpected mode
rendering it inoperational.
Fixes: a3dcb3e7e70c ("net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset") Signed-off-by: Alfred Lee <l00g33k@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230815001323.24739-1-l00g33k@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Return an error if a field's mask is neither full nor empty. When a mask
is only partial the field is not being used for rule programming but it
gives a wrong impression it is used. Fix by returning an error on any
partial mask to make it clear they are not supported.
The ip_ver assignment is moved earlier in code to allow using it in
iavf_validate_fdir_fltr_masks.
Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters") Fixes: e90cbc257a6f ("iavf: Support IPv6 Flow Director filters") Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Recent changes in net-next (commit 759ab1edb56c ("net: store netdevs
in an xarray")) refactored the handling of pre-assigned ifindexes
and let syzbot surface a latent problem in ovs. ovs does not validate
ifindex, making it possible to create netdev ports with negative
ifindex values. It's easy to repro with YNL:
$ ip link show
-65536: some-port0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 7a:48:21:ad:0b:fb brd ff:ff:ff:ff:ff:ff
...
Validate the inputs. Now the second command correctly returns:
Similar to commit 01f4fd270870 ("bonding: Fix incorrect deletion of
ETH_P_8021AD protocol vid from slaves"), we can trigger BUG_ON(!vlan_info)
in unregister_vlan_dev() with the following testcase:
# ip netns add ns1
# ip netns exec ns1 ip link add team1 type team
# ip netns exec ns1 ip link add team_slave type veth peer veth2
# ip netns exec ns1 ip link set team_slave master team1
# ip netns exec ns1 ip link add link team_slave name team_slave.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link add link team1 name team1.10 type vlan id 10 protocol 802.1ad
# ip netns exec ns1 ip link set team_slave nomaster
# ip netns del ns1
Add S-VLAN tag related features support to team driver. So the team driver
will always propagate the VLAN info to its slaves.
Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support") Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230814032301.2804971-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
The 54810 does not support c45. The mmd_phy_indirect accesses return
arbirtary values leading to odd behavior like saying it supports EEE
when it doesn't. We also see that reading/writing these non-existent
MMD registers leads to phy instability in some cases.
tx_timeout_task is canceled too early when removing the driver. Nothing
prevents .ndo_tx_timeout from triggering and queuing the work again.
Better cancel it after the netdev is unregistered.
It's harmless for octep_tx_timeout_task to run in the window between the
unregistration and cancelation, because it checks netif_running.
Fixes: 862cd659a6fb ("octeon_ep: Add driver framework and device initialization") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Link: https://lore.kernel.org/r/20230810150114.107765-3-mschmidt@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
On Zynq UltraScale+ MPSoC ubuntu platform when systemctl issues suspend,
network manager bring down the interface and goes into suspend. When it
wakes up it again enables the interface.
This leads to xilinx-psgtr "PLL lock timeout" on interface bringup, as
the power management controller power down the entire FPD (including
SERDES) if none of the FPD devices are in use and serdes is not
initialized on resume.
$ sudo rtcwake -m no -s 120 -v
$ sudo systemctl suspend <this does ifconfig eth1 down>
$ ifconfig eth1 up
xilinx-psgtr fd400000.phy: lane 0 (type 10, protocol 5): PLL lock timeout
phy phy-fd400000.phy.0: phy poweron failed --> -110
macb driver is called in this way:
1. macb_close: Stop network interface. In this function, it
reset MACB IP and disables PHY and network interface.
2. macb_suspend: It is called in kernel suspend flow. But because
network interface has been disabled(netif_running(ndev) is
false), it does nothing and returns directly;
3. System goes into suspend state. Some time later, system is
waken up by RTC wakeup device;
4. macb_resume: It does nothing because network interface has
been disabled;
5. macb_open: It is called to enable network interface again. ethernet
interface is initialized in this API but serdes which is power-off
by PMUFW during FPD-off suspend is not initialized again and so
we hit GT PLL lock issue on open.
To resolve this PLL timeout issue always do PS GTR initialization
when ethernet device is configured as non-wakeup source.