Dave Airlie [Thu, 2 Dec 2021 19:59:26 +0000 (05:59 +1000)]
Merge tag 'drm-intel-fixes-2021-12-02' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fixing a regression where the backlight brightness control stopped working.
- Fix the Intel HDR backlight support detection.
- Reverting a w/a to fix a gpu Hang in TGL. The w/a itself was also
for a hang, but in a much rarer scenario. The proper solution need
to be done with help from user space and it will be addressed later.
Philip Yang [Mon, 29 Nov 2021 17:33:05 +0000 (12:33 -0500)]
drm/amdkfd: process_info lock not needed for svm
process_info->lock is used to protect kfd_bo_list, vm_list_head, n_vms
and userptr valid/inval list, svm_range_restore_work and
svm_range_set_attr don't access those, so do not need to take
process_info lock. This will avoid potential circular locking issue.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
shaoyunl [Tue, 30 Nov 2021 02:29:05 +0000 (21:29 -0500)]
drm/amdgpu: adjust the kfd reset sequence in reset sriov function
This change revert previous commits: 54af4d05497e ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") a1584b09f7ec ("drm/amdgpu: move kfd post_reset out of reset_sriov function")
This change moves the amdgpu_amdkfd_pre_reset to an earlier place
in amdgpu_device_reset_sriov, presumably to address the sequence issue
that the first patch was originally meant to fix.
Some register access(GRBM_GFX_CNTL) only be allowed on full access
mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov
function.
Fixes: 54af4d05497e ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: a1584b09f7ec ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Perry Yuan [Fri, 19 Nov 2021 09:27:55 +0000 (04:27 -0500)]
drm/amd/display: add connector type check for CRC source set
[Why]
IGT bypass test will set crc source as DPRX,and display DM didn`t check
connection type, it run the test on the HDMI connector ,then the kernel
will be crashed because aux->transfer is set null for HDMI connection.
This patch will skip the invalid connection test and fix kernel crash issue.
[How]
Check the connector type while setting the pipe crc source as DPRX or
auto,if the type is not DP or eDP, the crtc crc source will not be set
and report error code to IGT test,IGT will show the this subtest as no
valid crtc/connector combinations found.
Philip Yang [Fri, 26 Nov 2021 23:43:09 +0000 (18:43 -0500)]
drm/amdkfd: fix double free mem structure
drm_gem_object_put calls release_notify callback to free the mem
structure and unreserve_mem_limit, move it down after the last access
of mem and make it conditional call.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jimmy Kizito [Mon, 15 Nov 2021 02:48:02 +0000 (21:48 -0500)]
drm/amd/display: Add work around for tunneled MST.
[Why]
Certain USB4 docks do not seem to be able to handle disabling
DSC once it has been enabled on an MST stream. This can result
in blank displays.
[How]
As a work around, always enable DSC on docks exhibiting this issue. The
flag to indicate the use of DSC for MST streams on a USB4 dock is set
during detection of the dock and only cleared when the USB4 dock is
disconnected.
Reviewed-by: Jun Lei <Jun.Lei@amd.com> Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Jimmy Kizito <Jimmy.Kizito@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mustapha Ghaddar [Mon, 15 Nov 2021 22:56:42 +0000 (17:56 -0500)]
drm/amd/display: Fix for the no Audio bug with Tiled Displays
[WHY]
It seems like after a series of plug/unplugs we end up in a situation
where tiled display doesnt support Audio.
[HOW]
The issue seems to be related to when we check streams changed after an
HPD, we should be checking the audio_struct as well to see if any of its
values changed.
Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Mustapha Ghaddar <mustapha.ghaddar@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Allow DSC on supported MST branch devices
[Why]
When trying to lightup two 4k60 non-DSC displays behind a branch device
that supports DSC we can't lightup both at once due to bandwidth
limitations - each requires 48 VCPI slots but we only have 63.
[How]
The workaround already exists in the code but is guarded by a CONFIG
that cannot be set by the user and shouldn't need to be.
Check for specific branch device IDs to device whether to enable
the workaround for multiple display scenarios.
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com> Acked-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Guchun Chen [Fri, 26 Nov 2021 05:06:15 +0000 (13:06 +0800)]
drm/amdgpu: fix the missed handling for SDMA2 and SDMA3
There is no base reg offset or ip_version set for SDMA2
and SDMA3 on SIENNA_CICHLID, so add them.
Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Thu, 18 Nov 2021 08:25:19 +0000 (16:25 +0800)]
drm/amdgpu: check atomic flag to differeniate with legacy path
since vkms support atomic KMS interface
Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <aleander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Flora Cui [Wed, 24 Nov 2021 02:33:38 +0000 (10:33 +0800)]
drm/amdgpu: cancel the correct hrtimer on exit
Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lyude Paul [Tue, 30 Nov 2021 21:29:09 +0000 (16:29 -0500)]
drm/i915/dp: Perform 30ms delay after source OUI write
While working on supporting the Intel HDR backlight interface, I noticed
that there's a couple of laptops that will very rarely manage to boot up
without detecting Intel HDR backlight support - even though it's supported
on the system. One example of such a laptop is the Lenovo P17 1st
generation.
Following some investigation Ville Syrjälä did through the docs they have
available to them, they discovered that there's actually supposed to be a
30ms wait after writing the source OUI before we begin setting up the rest
of the backlight interface.
This seems to be correct, as adding this 30ms delay seems to have
completely fixed the probing issues I was previously seeing. So - let's
start performing a 30ms wait after writing the OUI, which we do in a manner
similar to how we keep track of PPS delays (e.g. record the timestamp of
the OUI write, and then wait for however many ms are left since that
timestamp right before we interact with the backlight) in order to avoid
waiting any longer then we need to. As well, this also avoids us performing
this delay on systems where we don't end up using the HDR backlight
interface.
V3:
* Move last_oui_write into intel_dp
V2:
* Move panel delays into intel_pps
Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Fixes: 63a80843db80 ("drm/i915/dp: Enable Intel's HDR backlight interface (only SDR for now)") Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.12+ Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211130212912.212044-1-lyude@redhat.com
(cherry picked from commit c7c90b0b8418a97d3aa8b39aae1992908948efad) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Guangming [Fri, 26 Nov 2021 07:49:04 +0000 (15:49 +0800)]
dma-buf: system_heap: Use 'for_each_sgtable_sg' in pages free flow
For previous version, it uses 'sg_table.nent's to traverse sg_table in pages
free flow.
However, 'sg_table.nents' is reassigned in 'dma_map_sg', it means the number of
created entries in the DMA adderess space.
So, use 'sg_table.nents' in pages free flow will case some pages can't be freed.
Here we should use sg_table.orig_nents to free pages memory, but use the
sgtable helper 'for each_sgtable_sg'(, instead of the previous rather common
helper 'for_each_sg' which maybe cause memory leak) is much better.
Fixes: 29b5dcea2e196 ("dma-buf: system_heap: Allocate higher order pages if available") Signed-off-by: Guangming <Guangming.Cao@mediatek.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Cc: <stable@vger.kernel.org> # 5.11.* Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211126074904.88388-1-guangming.cao@mediatek.com
Lyude Paul [Fri, 5 Nov 2021 18:33:38 +0000 (14:33 -0400)]
drm/i915: Add support for panels with VESA backlights with PWM enable/disable
This simply adds proper support for panel backlights that can be controlled
via VESA's backlight control protocol, but which also require that we
enable and disable the backlight via PWM instead of via the DPCD interface.
We also enable this by default, in order to fix some people's backlights
that were broken by not having this enabled.
For reference, backlights that require this and use VESA's backlight
interface tend to be laptops with hybrid GPUs, but this very well may
change in the future.
v4:
* Make sure that we call intel_backlight_level_to_pwm() in
intel_dp_aux_vesa_enable_backlight() - vsyrjala
Maxime Ripard [Wed, 17 Nov 2021 09:45:27 +0000 (10:45 +0100)]
drm/vc4: kms: Fix previous HVS commit wait
Our current code is supposed to serialise the commits by waiting for all
the drm_crtc_commits associated to the previous HVS state.
However, assuming we have two CRTCs running and being configured and we
configure each one alternately, we end up in a situation where we're
not waiting at all.
Indeed, starting with a state (state 0) where both CRTCs are running,
and doing a commit (state 1) on the first CRTC (CRTC 0), we'll associate
its commit to its assigned FIFO in vc4_hvs_state.
If we get a new commit (state 2), this time affecting the second CRTC
(CRTC 1), the DRM core will allow both commits to execute in parallel
(assuming they don't have any share resources).
Our code in vc4_atomic_commit_tail is supposed to make sure we only get
one commit at a time and serialised by order of submission. It does so
by using for_each_old_crtc_in_state, making sure that the CRTC has a
FIFO assigned, is used, and has a commit pending. If it does, then we'll
wait for the commit before going forward.
During the transition from state 0 to state 1, as our old CRTC state we
get the CRTC 0 state 0, its commit, we wait for it, everything works fine.
During the transition from state 1 to state 2 though, the use of
for_each_old_crtc_in_state is wrong. Indeed, while the code assumes it's
returning the state of the CRTC in the old state (so CRTC 0 state 1), it
actually returns the old state of the CRTC affected by the current
commit, so CRTC 0 state 0 since it wasn't part of state 1.
Due to this, if we alternate between the configuration of CRTC 0 and
CRTC 1, we never actually wait for anything since we should be waiting
on the other every time, but it never is affected by the previous
commit.
Change the logic to, at every commit, look at every FIFO in the previous
HVS state, and if it's in use and has a commit associated to it, wait
for that commit.
Fixes: 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-7-maxime@cerno.tech
Maxime Ripard [Wed, 17 Nov 2021 09:45:26 +0000 (10:45 +0100)]
drm/vc4: kms: Don't duplicate pending commit
Our HVS global state, when duplicated, will also copy the pointer to the
drm_crtc_commit (and increase the reference count) for each FIFO if the
pointer is not NULL.
However, our atomic_setup function will overwrite that pointer without
putting the reference back leading to a memory leak.
Since the commit is only relevant during the atomic commit process, it
doesn't make sense to duplicate the reference to the commit anyway.
Let's remove it.
Fixes: 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-6-maxime@cerno.tech
Maxime Ripard [Wed, 17 Nov 2021 09:45:25 +0000 (10:45 +0100)]
drm/vc4: kms: Clear the HVS FIFO commit pointer once done
Commit 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a
commit") introduced a wait on the previous commit done on a given HVS
FIFO.
However, we never cleared that pointer once done. Since
drm_crtc_commit_put can free the drm_crtc_commit structure directly if
we were the last user, this means that it can lead to a use-after free
if we were to duplicate the state, and that stale pointer would even be
copied to the new state.
Set the pointer to NULL once we're done with the wait so that we don't
carry over a pointer to a free'd structure.
Fixes: 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-5-maxime@cerno.tech
Maxime Ripard [Wed, 17 Nov 2021 09:45:24 +0000 (10:45 +0100)]
drm/vc4: kms: Add missing drm_crtc_commit_put
Commit 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a
commit") introduced a global state for the HVS, with each FIFO storing
the current CRTC commit so that we can properly synchronize commits.
However, the refcounting was off and we thus ended up leaking the
drm_crtc_commit structure every commit. Add a drm_crtc_commit_put to
prevent the leakage.
Fixes: 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-4-maxime@cerno.tech
Maxime Ripard [Wed, 17 Nov 2021 09:45:23 +0000 (10:45 +0100)]
drm/vc4: kms: Fix return code check
The HVS global state functions return an error pointer, but in most
cases we check if it's NULL, possibly resulting in an invalid pointer
dereference.
Fixes: 773085fb64df ("drm/vc4: kms: Wait on previous FIFO users before a commit") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-3-maxime@cerno.tech
Maxime Ripard [Wed, 17 Nov 2021 09:45:22 +0000 (10:45 +0100)]
drm/vc4: kms: Wait for the commit before increasing our clock rate
Several DRM/KMS atomic commits can run in parallel if they affect
different CRTC. These commits share the global HVS state, so we have
some code to make sure we run commits in sequence. This synchronization
code is one of the first thing that runs in vc4_atomic_commit_tail().
Another constraints we have is that we need to make sure the HVS clock
gets a boost during the commit. That code relies on clk_set_min_rate and
will remove the old minimum and set a new one. We also need another,
temporary, minimum for the duration of the commit.
The algorithm is thus to set a temporary minimum, drop the previous
one, do the commit, and finally set the minimum for the current mode.
However, the part that sets the temporary minimum and drops the older
one runs before the commit synchronization code.
Thus, under the proper conditions, we can end up mixing up the minimums
and ending up with the wrong one for our current step.
To avoid it, let's move the clock setup in the protected section.
Fixes: f8ee023a91ff ("drm/vc4: hvs: Boost the core clock during modeset") Signed-off-by: Maxime Ripard <maxime@cerno.tech> Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Tested-by: Jian-Hong Pan <jhp@endlessos.org> Link: https://lore.kernel.org/r/20211117094527.146275-2-maxime@cerno.tech
Gurchetan Singh [Mon, 22 Nov 2021 23:22:09 +0000 (15:22 -0800)]
drm/virtgpu api: define a dummy fence signaled event
The current virtgpu implementation of poll(..) drops events
when VIRTGPU_CONTEXT_PARAM_POLL_RINGS_MASK is enabled (otherwise
it's like a normal DRM driver).
This is because paravirtualized userspaces receives responses in a
buffer of type BLOB_MEM_GUEST, not by read(..).
To be in line with other DRM drivers and avoid specialized behavior,
it is possible to define a dummy event for virtgpu. Paravirtualized
userspace will now have to call read(..) on the DRM fd to receive the
dummy event.
Linus Torvalds [Sun, 28 Nov 2021 19:58:52 +0000 (11:58 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
"Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken
does not seem to work, breaking too many guests in the process. We'll
need to do length validation using some other approach"
[ This merge also ends up reverting commit 7d1dd9f7968b ("vsock/virtio:
suppress used length validation"), which came in through the
networking tree in the meantime, and was part of that whole used
length validation series - Linus ]
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim: avoid putting an uninitialized iova_domain
vhost-vdpa: clean irqs before reseting vdpa device
virtio-blk: modify the value type of num in virtio_queue_rq()
vhost/vsock: cleanup removing `len` variable
vhost/vsock: fix incorrect used length reported to the guest
Revert "virtio_ring: validate used buffer length"
Revert "virtio-net: don't let virtio core to validate used length"
Revert "virtio-blk: don't let virtio core to validate used length"
Revert "virtio-scsi: don't let virtio core to validate used buffer length"
Linus Torvalds [Sun, 28 Nov 2021 17:15:34 +0000 (09:15 -0800)]
Merge tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single scheduler fix to ensure that there is no stale KASAN shadow
state left on the idle task's stack when a CPU is brought up after it
was brought down before"
* tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/scs: Reset task stack state in bringup_cpu()
Linus Torvalds [Sun, 28 Nov 2021 17:10:54 +0000 (09:10 -0800)]
Merge tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for perf to prevent it from sending SIGTRAP to another
task from a trace point event as it's not possible to deliver a
synchronous signal to a different task from there"
* tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Ignore sigtrap for tracepoints destined for other tasks
Linus Torvalds [Sun, 28 Nov 2021 17:04:41 +0000 (09:04 -0800)]
Merge tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two regression fixes for reader writer semaphores:
- Plug a race in the lock handoff which is caused by inconsistency of
the reader and writer path and can lead to corruption of the
underlying counter.
- down_read_trylock() is suboptimal when the lock is contended and
multiple readers trylock concurrently. That's due to the initial
value being read non-atomically which results in at least two
compare exchange loops. Making the initial readout atomic reduces
this significantly. Whith 40 readers by 11% in a benchmark which
enforces contention on mmap_sem"
* tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Optimize down_read_trylock() under highly contended case
locking/rwsem: Make handoff bit handling more consistent
Linus Torvalds [Sun, 28 Nov 2021 16:50:53 +0000 (08:50 -0800)]
Merge tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull another tracing fix from Steven Rostedt:
"Fix the fix of pid filtering
The setting of the pid filtering flag tested the "trace only this pid"
case twice, and ignored the "trace everything but this pid" case.
The 5.15 kernel does things a little differently due to the new sparse
pid mask introduced in 5.16, and as the bug was discovered running the
5.15 kernel, and the first fix was initially done for that kernel,
that fix handled both cases (only pid and all but pid), but the
forward port to 5.16 created this bug"
* tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Test the 'Do not trace this pid' case in create event
Linus Torvalds [Sat, 27 Nov 2021 22:49:35 +0000 (14:49 -0800)]
Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
"Five ksmbd server fixes, four of them for stable:
- memleak fix
- fix for default data stream on filesystems that don't support xattr
- error logging fix
- session setup fix
- minor doc cleanup"
* tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix memleak in get_file_stream_info()
ksmbd: contain default data stream even if xattr is empty
ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
docs: filesystem: cifs: ksmbd: Fix small layout issues
ksmbd: Fix an error handling path in 'smb2_sess_setup()'
Guenter Roeck [Sat, 27 Nov 2021 15:44:41 +0000 (07:44 -0800)]
fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k
NTFS_RW code allocates page size dependent arrays on the stack. This
results in build failures if the page size is 64k or larger.
fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
fs/ntfs/aops.c:1311:1: error:
the frame size of 2240 bytes is larger than 2048 bytes
Since commit 87081c57a8d6 ("powerpc/64s: Default to 64K pages for 64 bit
book3s") this affects ppc:allmodconfig builds, but other architectures
supporting page sizes of 64k or larger are also affected.
Increasing the maximum frame size for affected architectures just to
silence this error does not really help. The frame size would have to
be set to a really large value for 256k pages. Also, a large frame size
could potentially result in stack overruns in this code and elsewhere
and is therefore not desirable. Make NTFS_RW dependent on page sizes
smaller than 64k instead.
NTFS_RW and VMXNET3 require a page size smaller than 64kB. Add generic
Kconfig option for use outside architecture code to avoid architecture
specific Kconfig options in that code.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
tracing: Test the 'Do not trace this pid' case in create event
When creating a new event (via a module, kprobe, eprobe, etc), the
descriptors that are created must add flags for pid filtering if an
instance has pid filtering enabled, as the flags are used at the time the
event is executed to know if pid filtering should be done or not.
The "Only trace this pid" case was added, but a cut and paste error made
that case checked twice, instead of checking the "Trace all but this pid"
case.
Linus Torvalds [Sat, 27 Nov 2021 20:59:54 +0000 (12:59 -0800)]
Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"Fixes for a resource leak and a build robot complaint about totally
dead code:
- Fix buffer resource leak that could lead to livelock on corrupt fs.
- Remove unused function xfs_inew_wait to shut up the build robots"
* tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: remove xfs_inew_wait
xfs: Fix the free logic of state in xfs_attr_node_hasname
Linus Torvalds [Sat, 27 Nov 2021 20:50:03 +0000 (12:50 -0800)]
Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
"A single iomap bug fix and a cleanup for 5.16-rc2.
The bug fix changes how iomap deals with reading from an inline data
region -- whereas the current code (incorrectly) lets the iomap read
iter try for more bytes after reading the inline region (which zeroes
the rest of the page!) and hopes the next iteration terminates, we
surveyed the inlinedata implementations and realized that all
inlinedata implementations also require that the inlinedata region end
at EOF, so we can simply terminate the read.
The second patch documents these assumptions in the code so that
they're not subtle implications anymore, and cleans up some of the
grosser parts of that function.
Summary:
- Fix an accounting problem where unaligned inline data reads can run
off the end of the read iomap iterator. iomap has historically
required that inline data mappings only exist at the end of a file,
though this wasn't documented anywhere.
- Document iomap_read_inline_data and change its return type to be
appropriate for the information that it's actually returning"
* tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: iomap_read_inline_data cleanup
iomap: Fix inline extent handling in iomap_readpage
Linus Torvalds [Sat, 27 Nov 2021 20:03:57 +0000 (12:03 -0800)]
Merge tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two fixes to event pid filtering:
- Make sure newly created events reflect the current state of pid
filtering
- Take pid filtering into account when recording trigger events.
(Also clean up the if statement to be cleaner)"
* tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix pid filtering when triggers are attached
tracing: Check pid filtering when creating events
Linus Torvalds [Sat, 27 Nov 2021 19:28:37 +0000 (11:28 -0800)]
Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block
Pull more io_uring fixes from Jens Axboe:
"The locking fixup that was applied earlier this rc has both a deadlock
and IRQ safety issue, let's get that ironed out before -rc3. This
contains:
- Link traversal locking fix (Pavel)
- Cancelation fix (Pavel)
- Relocate cond_resched() for huge buffer chain freeing, avoiding a
softlockup warning (Ye)
- Fix timespec validation (Ye)"
* tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
io_uring: Fix undefined-behaviour in io_issue_sqe
io_uring: fix soft lockup when call __io_remove_buffers
io_uring: fix link traversal locking
io_uring: fail cancellation for EXITING tasks
Linus Torvalds [Sat, 27 Nov 2021 19:19:42 +0000 (11:19 -0800)]
Merge tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block
Pull more block fixes from Jens Axboe:
"Turns out that the flushing out of pending fixes before the
Thanksgiving break didn't quite work out in terms of timing, so here's
a followup set of fixes:
- rq_qos_done() should be called regardless of whether or not we're
the final put of the request, it's not related to the freeing of
the state. This fixes an IO stall with wbt that a few users have
reported, a regression in this release.
- Only define zram_wb_devops if it's used, fixing a compilation
warning for some compilers"
* tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK
block: call rq_qos_done() before ref check in batch completions
Linus Torvalds [Sat, 27 Nov 2021 19:15:17 +0000 (11:15 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Twelve fixes, eleven in drivers (target, qla2xx, scsi_debug, mpt3sas,
ufs). The core fix is a minor correction to the previous state update
fix for the iscsi daemons"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_debug: Zero clear zones at reset write pointer
scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
scsi: target: configfs: Delete unnecessary checks for NULL
scsi: target: core: Use RCU helpers for INQUIRY t10_alua_tg_pt_gp
scsi: mpt3sas: Fix incorrect system timestamp
scsi: mpt3sas: Fix system going into read-only mode
scsi: mpt3sas: Fix kernel panic during drive powercycle test
scsi: ufs: ufs-mediatek: Add put_device() after of_find_device_by_node()
scsi: scsi_debug: Fix type in min_t to avoid stack OOB
scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo()
scsi: ufs: ufshpb: Fix warning in ufshpb_set_hpb_read_to_upiu()
Linus Torvalds [Sat, 27 Nov 2021 18:33:55 +0000 (10:33 -0800)]
Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
"Highlights include:
Stable fixes:
- NFSv42: Fix pagecache invalidation after COPY/CLONE
Bugfixes:
- NFSv42: Don't fail clone() just because the server failed to return
post-op attributes
- SUNRPC: use different lockdep keys for INET6 and LOCAL
- NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION
- SUNRPC: fix header include guard in trace header"
* tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: use different lock keys for INET6 and LOCAL
sunrpc: fix header include guard in trace header
NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION
NFSv42: Fix pagecache invalidation after COPY/CLONE
NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
NFSv42: Don't fail clone() unless the OP_CLONE operation failed
Linus Torvalds [Sat, 27 Nov 2021 18:06:15 +0000 (10:06 -0800)]
Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix KVM using a Power9 instruction on earlier CPUs, which could lead
to the host SLB being incorrectly invalidated and a subsequent host
crash.
Fix kernel hardlockup on vmap stack overflow on 32-bit.
Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"
* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32: Fix hardlockup on vmap stack overflow
KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
Linus Torvalds [Sat, 27 Nov 2021 17:50:31 +0000 (09:50 -0800)]
Merge tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- build fix for ZSTD enabled configs
- fix for preempt warning
- fix for loongson FTLB detection
- fix for page table level selection
* tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
MIPS: loongson64: fix FTLB configuration
MIPS: Fix using smp_processor_id() in preemptible in show_cpuinfo()
MIPS: boot/compressed/: add __ashldi3 to target for ZSTD compression
Ye Bin [Thu, 18 Nov 2021 01:59:07 +0000 (09:59 +0800)]
io_uring: Fix undefined-behaviour in io_issue_sqe
We got issue as follows:
================================================================================
UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
signed integer overflow:
-4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x170/0x1dc lib/dump_stack.c:118
ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
handle_overflow+0x188/0x1dc lib/ubsan.c:192
__ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
ktime_set include/linux/ktime.h:42 [inline]
timespec64_to_ktime include/linux/ktime.h:78 [inline]
io_timeout fs/io_uring.c:5153 [inline]
io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
__io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
io_submit_sqe fs/io_uring.c:6137 [inline]
io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
__do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
__se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
__arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
================================================================================
As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
negative value maybe lead to overflow.
To address this issue, we must check if 'sec' is negative.
The reason above issue is 'buf->list' has 2,100,000 nodes, occupied cpu lead
to soft lockup.
To solve this issue, we need add schedule point when do while loop in
'__io_remove_buffers'.
After add schedule point we do regression, get follow data.
[ 240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[ 268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[ 275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[ 296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
[ 305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[ 325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
[ 333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
...
tracing: Fix pid filtering when triggers are attached
If a event is filtered by pid and a trigger that requires processing of
the event to happen is a attached to the event, the discard portion does
not take the pid filtering into account, and the event will then be
recorded when it should not have been.
Alex Williamson [Fri, 26 Nov 2021 13:55:56 +0000 (21:55 +0800)]
iommu/vt-d: Fix unmap_pages support
When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap. VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.
However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments. If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.
As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level. We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level. When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level. In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done. Meanwhile we never cleared the ptes for the remainder
of the range. When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.
The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.
iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock()
If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
Go through the end of the function instead. This way, the missing
'rcu_read_unlock()' is called.
Alex Bee [Wed, 24 Nov 2021 02:13:25 +0000 (03:13 +0100)]
iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568
With the submission of iommu driver for RK3568 a subtle bug was
introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
the other way arround - that leads to random errors, especially when
addresses beyond 32 bit are used.
Fix it.
Fixes: 538f003ea048 ("iommu: rockchip: Add support for iommu v2") Signed-off-by: Alex Bee <knaerzche@gmail.com> Tested-by: Peter Geis <pgwipeout@gmail.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Dan Johansen <strit@manjaro.org> Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
The messages printed on the initialization of the AMD IOMMUv2 driver
have caused some confusion in the past. Clarify the messages to lower
the confusion in the future.
- netfilter: ctnetlink: fix error codes and flags used for kernel
side filtering of dumps
- netfilter: flowtable: fix IPv6 tunnel addr match
- ncsi: align payload to 32-bit to fix dropped packets
- iavf: fix deadlock and loss of config during VF interface reset
- ice: avoid bpf_prog refcount underflow
- ocelot: fix broken PTP over IP and PTP API violations
Misc:
- marvell: mvpp2: increase MTU limit when XDP enabled"
* tag 'net-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
net: dsa: microchip: implement multi-bridge support
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
net: mscc: ocelot: set up traps for PTP packets
net: ptp: add a definition for the UDP port for IEEE 1588 general messages
net: mscc: ocelot: create a function that replaces an existing VCAP filter
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
net: hns3: fix incorrect components info of ethtool --reset command
net: hns3: fix one incorrect value of page pool info when queried by debugfs
net: hns3: add check NULL address for page pool
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
net: qed: fix the array may be out of bound
net/smc: Don't call clcsock shutdown twice when smc shutdown
net: vlan: fix underflow for the real_dev refcnt
ptp: fix filter names in the documentation
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
nfc: virtual_ncidev: change default device permissions
net/sched: sch_ets: don't peek at classes beyond 'nbands'
net: stmmac: Disable Tx queues when reconfiguring the interface
selftests: tls: test for correct proto_ops
tls: fix replacing proto_ops
...
Oleksij Rempel [Fri, 26 Nov 2021 12:39:26 +0000 (13:39 +0100)]
net: dsa: microchip: implement multi-bridge support
Current driver version is able to handle only one bridge at time.
Configuring two bridges on two different ports would end up shorting this
bridges by HW. To reproduce it:
ip l a name br0 type bridge
ip l a name br1 type bridge
ip l s dev br0 up
ip l s dev br1 up
ip l s lan1 master br0
ip l s dev lan1 up
ip l s lan2 master br1
ip l s dev lan2 up
Ping on lan1 and get response on lan2, which should not happen.
This happened, because current driver version is storing one global "Port VLAN
Membership" and applying it to all ports which are members of any
bridge.
To solve this issue, we need to handle each port separately.
This patch is dropping the global port member storage and calculating
membership dynamically depending on STP state and bridge participation.
Note: STP support was broken before this patch and should be fixed
separately.
Fixes: 8135f56c18a2 ("net: dsa: microchip: break KSZ9477 DSA driver into two files") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 20:19:13 +0000 (12:19 -0800)]
Merge tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a NULL pointer dereference in the CPPC library code and a
locking issue related to printing the names of ACPI device nodes in
the device properties framework.
Specifics:
- Fix NULL pointer dereference in the CPPC library code occuring on
hybrid systems without CPPC support (Rafael Wysocki).
- Avoid attempts to acquire a semaphore with interrupts off when
printing the names of ACPI device nodes and clean up code on top of
that fix (Sakari Ailus)"
* tag 'acpi-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Add NULL pointer check to cppc_get_perf()
ACPI: Make acpi_node_get_parent() local
ACPI: Get acpi_device's parent from the parent field
Linus Torvalds [Fri, 26 Nov 2021 20:14:50 +0000 (12:14 -0800)]
Merge tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These address three issues in the intel_pstate driver and fix two
problems related to hibernation.
Specifics:
- Make intel_pstate work correctly on Ice Lake server systems with
out-of-band performance control enabled (Adamos Ttofari).
- Fix EPP handling in intel_pstate during CPU offline and online in
the active mode (Rafael Wysocki).
- Make intel_pstate support ITMT on asymmetric systems with
overclocking enabled (Srinivas Pandruvada).
- Fix hibernation image saving when using the user space interface
based on the snapshot special device file (Evan Green).
- Make the hibernation code release the snapshot block device using
the same mode that was used when acquiring it (Thomas Zeitlhofer)"
* tag 'pm-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: Fix snapshot partial write lengths
PM: hibernate: use correct mode for swsusp_close()
cpufreq: intel_pstate: ITMT support for overclocked system
cpufreq: intel_pstate: Fix active mode offline/online EPP handling
cpufreq: intel_pstate: Add Ice Lake server to out-of-band IDs
Linus Torvalds [Fri, 26 Nov 2021 20:01:31 +0000 (12:01 -0800)]
Merge tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
"Fix a regression caused by a bugfix in the previous release. The
symptom is a VM_BUG_ON triggered from splice to the fuse device.
Unfortunately the original bugfix was already backported to a number
of stable releases, so this fix-fix will need to be backported as
well"
* tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: release pipe buf after last use
====================
Fix broken PTP over IP on Ocelot switches
Changes in v2: added patch 5, added Richard's ack for the whole series
sans patch 5 which is new.
Po Liu reported recently that timestamping PTP over IPv4 is broken using
the felix driver on NXP LS1028A. This has been known for a while, of
course, since it has always been broken. The reason is because IP PTP
packets are currently treated as unknown IP multicast, which is not
flooded to the CPU port in the ocelot driver design, so packets don't
reach the ptp4l program.
The series solves the problem by installing packet traps per port when
the timestamping ioctl is called, depending on the RX filter selected
(L2, L4 or both).
====================
Vladimir Oltean [Fri, 26 Nov 2021 17:28:45 +0000 (19:28 +0200)]
net: mscc: ocelot: correctly report the timestamping RX filters in ethtool
The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.
Fixes: c9a4992060bc ("net: mscc: PTP Hardware Clock (PHC) support") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:44 +0000 (19:28 +0200)]
net: mscc: ocelot: set up traps for PTP packets
IEEE 1588 support was declared too soon for the Ocelot switch. Out of
reset, this switch does not apply any special treatment for PTP packets,
i.e. when an event message is received, the natural tendency is to
forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
is under a bridge, since user space application stacks (written
primarily for endpoint ports, not switches) like ptp4l expect that PTP
messages are always received on AF_PACKET / AF_INET sockets (depending
on the PTP transport being used), and never being autonomously
forwarded. Any forwarding, if necessary (for example in Transparent
Clock mode) is handled in software by ptp4l. Having the hardware forward
these packets too will cause duplicates which will confuse endpoints
connected to these switches.
So PTP over L2 barely works, in the sense that PTP packets reach the CPU
port, but they reach it via flooding, and therefore reach lots of other
unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
This is because the Ocelot switch have a separate destination port mask
for unknown IP multicast (which PTP over IP is) flooding compared to
unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
the driver allows the CPU port to be in the PGID_MC port group, but not
in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
switches is not very powerful at all, so every penny they could save by
not allowing flooding to the CPU port module matters. Unknown IP
multicast did not make it.
The de facto consensus is that when a switch is PTP-aware and an
application stack for PTP is running, switches should have some sort of
trapping mechanism for PTP packets, to extract them from the hardware
data path. This avoids both problems:
(a) PTP packets are no longer flooded to unwanted destinations
(b) PTP over IP packets are no longer denied from reaching the CPU since
they arrive there via a trap and not via flooding
It is not the first time when this change is attempted. Last time, the
feedback from Allan Nielsen and Andrew Lunn was that the traps should
not be installed by default, and that PTP-unaware switching may be
desired for some use cases:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
To address that feedback, the present patch adds the necessary packet
traps according to the RX filter configuration transmitted by user space
through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
keep 5 filters, which are amended each time RX timestamping is enabled
or disabled on a port:
- 1 for PTP over L2
- 2 for PTP over IPv4 (UDP ports 319 and 320)
- 2 for PTP over IPv6 (UDP ports 319 and 320)
The cookie by which these filters (invisible to tc) are identified is
strategically chosen such that it does not collide with the filters used
for the ocelot-8021q tagging protocol by the Felix driver, or with the
MRP traps set up by the Ocelot library.
Other alternatives were considered, like patching user space to do
something, but there are so many ways in which PTP packets could be made
to reach the CPU, generically speaking, that "do what?" is a very valid
question. The ptp4l program from the linuxptp stack already attempts to
do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
a dev_mc_add() on the interface, in the kernel:
https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
Reality shows that this is not sufficient in case the interface belongs
to a switchdev driver, as dev_mc_add() does not show the intention to
trap a packet to the CPU, but rather the intention to not drop it (it is
strictly for RX filtering, same as promiscuous does not mean to send all
traffic to the CPU, but to not drop traffic with unknown MAC DA). This
topic is a can of worms in itself, and it would be great if user space
could just stay out of it.
On the other hand, setting up PTP traps privately within the driver is
not new by any stretch of the imagination:
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
So this is the approach taken here as well. The difference here being
that we prepare and destroy the traps per port, dynamically at runtime,
as opposed to driver init time, because apparently, PTP-unaware
forwarding is a use case.
Fixes: c9a4992060bc ("net: mscc: PTP Hardware Clock (PHC) support") Reported-by: Po Liu <po.liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:43 +0000 (19:28 +0200)]
net: ptp: add a definition for the UDP port for IEEE 1588 general messages
As opposed to event messages (Sync, PdelayReq etc) which require
timestamping, general messages (Announce, FollowUp etc) do not.
In PTP they are part of different streams of data.
IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
destination port assigned by IANA is 319 for event messages, and 320 for
general messages. Yet the kernel seems to be missing the definition for
general messages. This patch adds it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:42 +0000 (19:28 +0200)]
net: mscc: ocelot: create a function that replaces an existing VCAP filter
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
tc flower offload on ocelot, among other things. The ingress port mask
on which VCAP rules match is present as a bit field in the actual key of
the rule. This means that it is possible for a rule to be shared among
multiple source ports. When the rule is added one by one on each desired
port, that the ingress port mask of the key must be edited and rewritten
to hardware.
But the API in ocelot_vcap.c does not allow for this. For one thing,
ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
because ocelot_vcap_filter_add() works with a preallocated and
prepopulated filter and programs it to hardware, and
ocelot_vcap_filter_del() does both the job of removing the specified
filter from hardware, as well as kfreeing it. That is to say, the only
option of editing a filter in place, which is to delete it, modify the
structure and add it back, does not work because it results in
use-after-free.
This patch introduces ocelot_vcap_filter_replace, which trivially
reprograms a VCAP entry to hardware, at the exact same index at which it
existed before, without modifying any list or allocating any memory.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Vladimir Oltean [Fri, 26 Nov 2021 17:28:41 +0000 (19:28 +0200)]
net: mscc: ocelot: don't downgrade timestamping RX filters in SIOCSHWTSTAMP
The ocelot driver, when asked to timestamp all receiving packets, 1588
v1 or NTP, says "nah, here's 1588 v2 for you".
According to this discussion:
https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
drivers that downgrade from a wider request to a narrower response (or
even a response where the intersection with the request is empty) are
buggy, and should return -ERANGE instead. This patch fixes that.
Fixes: c9a4992060bc ("net: mscc: PTP Hardware Clock (PHC) support") Suggested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jie Wang [Fri, 26 Nov 2021 12:03:18 +0000 (20:03 +0800)]
net: hns3: fix incorrect components info of ethtool --reset command
Currently, HNS3 driver doesn't clear the reset flags of components after
successfully executing reset, it causes userspace info of
"Components reset" and "Components not reset" is incorrect.
So fix this problem by clear corresponding reset flag after reset process.
Fixes: 71efd7b1ab84 ("net: hns3: add support for triggering reset by ethtool") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Hao Chen [Fri, 26 Nov 2021 12:03:17 +0000 (20:03 +0800)]
net: hns3: fix one incorrect value of page pool info when queried by debugfs
Currently, when user queries page pool info by debugfs command
"cat page_pool_info", the cnt of allocated page for page pool may be
incorrect because of memory inconsistency problem caused by compiler
optimization.
So this patch uses READ_ONCE() to read value of pages_state_hold_cnt to
fix this problem.
Fixes: f9b81c4439c4 ("net: hns3: debugfs add support dumping page pool info") Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Guangbin Huang [Fri, 26 Nov 2021 12:03:15 +0000 (20:03 +0800)]
net: hns3: fix VF RSS failed problem after PF enable multi-TCs
When PF is set to multi-TCs and configured mapping relationship between
priorities and TCs, the hardware will active these settings for this PF
and its VFs.
In this case when VF just uses one TC and its rx packets contain priority,
and if the priority is not mapped to TC0, as other TCs of VF is not valid,
hardware always put this kind of packets to the queue 0. It cause this kind
of packets of VF can not be used RSS function.
To fix this problem, set tc mode of all unused TCs of VF to the setting of
TC0, then rx packet with priority which map to unused TC will be direct to
TC0.
When pid filtering is activated in an instance, all of the events trace
files for that instance has the PID_FILTER flag set. This determines
whether or not pid filtering needs to be done on the event, otherwise the
event is executed as normal.
If pid filtering is enabled when an event is created (via a dynamic event
or modules), its flag is not updated to reflect the current state, and the
events are not filtered properly.
Tony Lu [Fri, 26 Nov 2021 02:41:35 +0000 (10:41 +0800)]
net/smc: Don't call clcsock shutdown twice when smc shutdown
When applications call shutdown() with SHUT_RDWR in userspace,
smc_close_active() calls kernel_sock_shutdown(), and it is called
twice in smc_shutdown().
This fixes this by checking sk_state before do clcsock shutdown, and
avoids missing the application's call of smc_shutdown().
Ziyang Xuan [Fri, 26 Nov 2021 01:59:42 +0000 (09:59 +0800)]
net: vlan: fix underflow for the real_dev refcnt
Inject error before dev_hold(real_dev) in register_vlan_dev(),
and execute the following testcase:
ip link add dev dummy1 type dummy
ip link add name dummy1.100 link dummy1 type vlan id 100
ip link del dev dummy1
When the dummy netdevice is removed, we will get a WARNING as following:
=======================================================================
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 2 PID: 0 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0
and an endless loop of:
=======================================================================
unregister_netdevice: waiting for dummy1 to become free. Usage count = -1073741824
That is because dev_put(real_dev) in vlan_dev_free() be called without
dev_hold(real_dev) in register_vlan_dev(). It makes the refcnt of real_dev
underflow.
Move the dev_hold(real_dev) to vlan_dev_init() which is the call-back of
ndo_init(). That makes dev_hold() and dev_put() for vlan's real_dev
symmetrical.
Fixes: 52f553199741 ("net: vlan: fix a UAF in vlan_dev_real_dev()") Reported-by: Petr Machata <petrm@nvidia.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Link: https://lore.kernel.org/r/20211126015942.2918542-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Julian Wiedmann [Fri, 26 Nov 2021 17:55:43 +0000 (18:55 +0100)]
ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()
ethtool_set_coalesce() now uses both the .get_coalesce() and
.set_coalesce() callbacks. But the check for their availability is
buggy, so changing the coalesce settings on a device where the driver
provides only _one_ of the callbacks results in a NULL pointer
dereference instead of an -EOPNOTSUPP.
Fix the condition so that the availability of both callbacks is
ensured. This also matches the netlink code.
Note that reproducing this requires some effort - it only affects the
legacy ioctl path, and needs a specific combination of driver options:
- have .get_coalesce() and .coalesce_supported but no
.set_coalesce(), or
- have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't
cause the crash as it first attempts to call ethtool_get_coalesce()
and bails out on error.
Fixes: 8c9f3b32dcfc ("ethtool: extend coalesce setting uAPI with CQE mode") Cc: Yufeng Mo <moyufeng@huawei.com> Cc: Huazhong Tan <tanhuazhong@huawei.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Davide Caratti [Wed, 24 Nov 2021 16:14:40 +0000 (17:14 +0100)]
net/sched: sch_ets: don't peek at classes beyond 'nbands'
when the number of DRR classes decreases, the round-robin active list can
contain elements that have already been freed in ets_qdisc_change(). As a
consequence, it's possible to see a NULL dereference crash, caused by the
attempt to call cl->qdisc->ops->peek(cl->qdisc) when cl->qdisc is NULL:
v3: fix race between ets_qdisc_change() and ets_qdisc_dequeue() delisting
DRR classes beyond 'nbands' in ets_qdisc_change() with the qdisc lock
acquired, thanks to Cong Wang.
v2: when a NULL qdisc is found in the DRR active list, try to dequeue skb
from the next list item.
Yannick Vignon [Wed, 24 Nov 2021 15:47:31 +0000 (16:47 +0100)]
net: stmmac: Disable Tx queues when reconfiguring the interface
The Tx queues were not disabled in situations where the driver needed to
stop the interface to apply a new configuration. This could result in a
kernel panic when doing any of the 3 following actions:
* reconfiguring the number of queues (ethtool -L)
* reconfiguring the size of the ring buffers (ethtool -G)
* installing/removing an XDP program (ip l set dev ethX xdp)
Prevent the panic by making sure netif_tx_disable is called when stopping
an interface.
Without this patch, the following kernel panic can be observed when doing
any of the actions above:
Unable to handle kernel paging request at virtual address ffff80001238d040
[....]
Call trace:
dwmac4_set_addr+0x8/0x10
dev_hard_start_xmit+0xe4/0x1ac
sch_direct_xmit+0xe8/0x39c
__dev_queue_xmit+0x3ec/0xaf0
dev_queue_xmit+0x14/0x20
[...]
[ end trace 0000000000000002 ]---
Fixes: f43adc39cc614 ("net: stmmac: Add initial XDP support") Fixes: 53da2eeb88818 ("net: stmmac: Add support to Ethtool get/set ring parameters") Fixes: 229a0e8918f93 ("net: stmmac: add ethtool support for get/set channels") Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com> Link: https://lore.kernel.org/r/20211124154731.1676949-1-yannick.vignon@oss.nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Linus Torvalds [Fri, 26 Nov 2021 18:27:43 +0000 (10:27 -0800)]
Merge tag 'staging-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging fixes from Greg KH:
"Here are some small staging driver fixes and one driver removal for
5.16-rc3.
The fixes resolve a number of small issues found in 5.16-rc1, nothing
huge at all. The driver removal was due to a platform being removed in
5.16-rc1, but this driver was forgotten about. It wasn't being built
anymore so it's safe to delete.
All have been in linux-next for a while with no reported problems"
* tag 'staging-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8192e: Fix use after free in _rtl92e_pci_disconnect()
staging: greybus: Add missing rwsem around snd_ctl_remove() calls
staging: Remove Netlogic XLP network driver
staging: r8188eu: fix a memory leak in rtw_wx_read32()
staging: r8188eu: use GFP_ATOMIC under spinlock
staging: r8188eu: Use kzalloc() with GFP_ATOMIC in atomic context
staging/fbtft: Fix backlight
staging: r8188eu: Fix breakage introduced when 5G code was removed
Linus Torvalds [Fri, 26 Nov 2021 17:59:55 +0000 (09:59 -0800)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"I2C has an interrupt storm fix for the i801, better timeout handling
for the new virtio driver, and some documentation fixes this time"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
docs: i2c: smbus-protocol: mention the repeated start condition
i2c: virtio: disable timeout handling
i2c: i801: Fix interrupt storm from SMB_ALERT signal
i2c: i801: Restore INTREN on unload
dt-bindings: i2c: imx-lpi2c: Fix i.MX 8QM compatible matching
Linus Torvalds [Fri, 26 Nov 2021 17:54:13 +0000 (09:54 -0800)]
Merge tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- Kconfig fix to make it possible to control building of the privcmd
driver
- three fixes for issues identified by the kernel test robot
- a five-patch series to simplify timeout handling for Xen PV driver
initialization
- two patches to fix error paths in xenstore/xenbus driver
initialization
* tag 'for-linus-5.16c-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: make HYPERVISOR_set_debugreg() always_inline
xen: make HYPERVISOR_get_debugreg() always_inline
xen: detect uninitialized xenbus in xenbus_init
xen: flag xen_snd_front to be not essential for system boot
xen: flag pvcalls-front to be not essential for system boot
xen: flag hvc_xen to be not essential for system boot
xen: flag xen_drm_front to be not essential for system boot
xen: add "not_essential" flag to struct xenbus_driver
xen/pvh: add missing prototype to header
xen: don't continue xenstore initialization in case of errors
xen/privcmd: make option visible in Kconfig
Linus Torvalds [Fri, 26 Nov 2021 17:30:24 +0000 (09:30 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Three arm64 fixes.
The main one is a fix to the way in which we evaluate the macro
arguments to our uaccess routines, which we _think_ might be the root
cause behind some unkillable tasks we've seen in the Android arm64 CI
farm (testing is ongoing). In any case, it's worth fixing.
Other than that, we've toned down an over-zealous VM_BUG_ON() and
fixed ftrace stack unwinding in a bunch of cases.
Summary:
- Evaluate uaccess macro arguments outside of the critical section
- Tighten up VM_BUG_ON() in pmd_populate_kernel() to avoid false positive
- Fix ftrace stack unwinding using HAVE_FUNCTION_GRAPH_RET_ADDR_PTR"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: uaccess: avoid blocking within critical sections
arm64: mm: Fix VM_BUG_ON(mm != &init_mm) for trans_pgd
arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
Jens Axboe [Fri, 26 Nov 2021 16:53:23 +0000 (09:53 -0700)]
block: call rq_qos_done() before ref check in batch completions
We need to call rq_qos_done() regardless of whether or not we're freeing
the request or not, as the reference count doesn't cover the IO completion
tracking.
Fixes: caf498e6f609 ("block: add support for blk_mq_end_request_batch()") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reported-by: Kenneth R. Crudup <kenny@panix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
4f5f48ebd6244 ("io_uring: correct link-list traversal locking") fixed a
data race but introduced a possible deadlock and inconsistentcy in irq
states. E.g.
Another type of problem is freeing a request while holding
->timeout_lock, which may leads to a deadlock in
io_commit_cqring() -> io_flush_timeouts() and other places.
Having 3 nested locks is also too ugly. Add io_match_task_safe(), which
would briefly take and release timeout_lock for race prevention inside,
so the actuall request cancellation / free / etc. code doesn't have it
taken.