Andrew Price [Tue, 22 Mar 2022 19:05:51 +0000 (19:05 +0000)]
gfs2: Make sure FITRIM minlen is rounded up to fs block size
Per fstrim(8) we must round up the minlen argument to the fs block size.
The current calculation doesn't take into account devices that have a
discard granularity and requested minlen less than 1 fs block, so the
value can get shifted away to zero in the translation to fs blocks.
The zero minlen passed to gfs2_rgrp_send_discards() then allows
sb_issue_discard() to be called with nr_sects == 0 which returns -EINVAL
and results in gfs2_rgrp_send_discards() returning -EIO.
Make sure minlen is never < 1 fs block by taking the max of the
requested minlen and the fs block size before comparing to the device's
discard granularity and shifting to fs blocks.
Fixes: 2ad9559e9aae9 ("GFS2: Fix FITRIM argument handling") Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When direct writes fail with -ENOTBLK because we're writing into a
hole (gfs2_iomap_begin()) or because of a page invalidation failure
(iomap_dio_rw()), we're falling back to buffered writes. In that case,
when we lose the inode glock in gfs2_file_buffered_write(), we want to
re-acquire it instead of returning a short write.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Function iomap_dio_rw() only returns -ENOTBLK for write requests and
gfs2_file_direct_read() no longer returns -ENOTBLK since commit 7e8d56e8b83aa ("gfs2: Use iomap for stuffed direct I/O reads"), so there
is no need to check for -ENOTBLK in gfs2_file_read_iter() anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
gfs2: Disable page faults during lockless buffered reads
During lockless buffered reads, filemap_read() holds page cache page
references while trying to copy data to the user-space buffer. The
calling process isn't holding the inode glock, but the page references
it holds prevent those pages from being removed from the page cache, and
that prevents the underlying inode glock from being moved to another
node. Thus, we can end up in the same kinds of distributed deadlock
situations as with normal (non-lockless) buffered reads.
Fix that by disabling page faults during lockless reads as well.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Fix the fault-in window size logic:
* Use a maximum window size of 1 MiB instead of BIO_MAX_VECS * PAGE_SIZE.
The previous window size was always one page because the pages variable
was accidentally being defined and then redefined in
should_fault_in_pages().
* The nr_dirtied heuristic for guessing when there might be memory
pressure often results in very small window sizes. Don't let
nr_dirtied drop below 8 pages (as btrfs does).
* Compute the window size in units of bytes, not pages.
* Account for page overlap (unaligned iterators).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
The gh_error field if a glock holder is initialized to zero in
gfs2_holder_init(). When a locking operation fails, gh_error is set to
an error code; when it succeeds, the gh_error value is left unchanged.
The field isn't initialized in gfs2_holder_reinit(), which is a problem.
Instead of fixing that directly, initialize gh_error in gfs2_glock_nq().
That also obsoletes the assignment in do_flock().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
This patch tries to fix the continual ABBA deadlocks we keep having
between the iopen and inode glocks. This switches the lock order in
gfs2_inode_lookup and gfs2_create_inode so the iopen glock is always
locked first.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
The gfs2 evict code tries to upgrade the iopen glock from SH to EX. If
the attempt to upgrade times out, gfs2 needs to tell dlm to cancel the
lock request or it can deadlock. We also need to wake up the process
waiting for the lock when dlm sends its AST back to gfs2.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
gfs2: Expect -EBUSY after canceling dlm locking requests
Due to the asynchronous nature of the dlm api, when we request a pending
locking request to be canceled with dlm_unlock(DLM_LKF_CANCEL), the
locking request will either complete before it could be canceled, or the
cancellation will succeed. In either case, gdlm_ast will be called once
and the status will indicate the outcome of the locking request, with
-DLM_ECANCEL indicating a canceled request.
Inside dlm, when a locking request completes before its cancel request
could be processed, gdlm_ast will be called, but the lock will still be
considered busy until a DLM_MSG_CANCEL_REPLY message completes the
cancel request. During that time, successive dlm_lock() or dlm_unlock()
requests for that lock will return -EBUSY. In other words, waiting for
the gdlm_ast call before issuing the next locking request is not enough.
There is no way of waiting for a cancel request to actually complete,
either.
We rarely cancel locking requests, but when we do, we don't know when
the next locking request for that lock will occur. This means that any
dlm_lock() or dlm_unlock() call can potentially return -EBUSY. When
that happens, this patch simply repeats the request after a short pause.
This workaround could be improved upon by tracking for which dlm locks
cancel requests have been issued, but that isn't strictly necessary and
it would complicate the code. We haven't seen -EBUSY errors from dlm
without cancel requests.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get
rid of any reservations the inode may have. Instead, it should pass in
the inode's write count as the second parameter to allow
gfs2_rs_delete() to figure out if the inode has any writers left.
In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left
where we know that there can be no other users of the inode. Replace
those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write
count check.
With that, gfs2_rs_delete() is only called with the inode's actual write
count, so get rid of the second parameter.
Fixes: fe5322fb8db9 ("GFS2: Make rgrp reservations part of the gfs2_inode structure") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Bob Peterson [Mon, 17 Jan 2022 15:25:07 +0000 (10:25 -0500)]
gfs2: assign rgrp glock before compute_bitstructs
Before this patch, function read_rindex_entry called compute_bitstructs
before it allocated a glock for the rgrp. But if compute_bitstructs found
a problem with the rgrp, it called gfs2_consist_rgrpd, and that called
gfs2_dump_glock for rgd->rd_gl which had not yet been assigned.
read_rindex_entry
compute_bitstructs
gfs2_consist_rgrpd
gfs2_dump_glock <---------rgd->rd_gl was not set.
This patch changes read_rindex_entry so it assigns an rgrp glock before
calling compute_bitstructs so gfs2_dump_glock does not reference an
unassigned pointer. If an error is discovered, the glock must also be
put, so a new goto and label were added.
Reported-by: syzbot+c6fd14145e2f62ca0784@syzkaller.appspotmail.com Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Linus Torvalds [Sun, 13 Feb 2022 19:58:11 +0000 (11:58 -0800)]
Merge tag 'kbuild-fixes-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix the truncated path issue for HAVE_GCC_PLUGINS test in Kconfig
- Move -Wunsligned-access to W=1 builds to avoid sprinkling warnings
for the latest Clang
- Fix missing fclose() in Kconfig
- Fix Kconfig to touch dep headers correctly when KCONFIG_AUTOCONFIG is
overridden.
* tag 'kbuild-fixes-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix failing to generate auto.conf
kconfig: fix missing fclose() on error paths
Makefile.extrawarn: Move -Wunaligned-access to W=1
kconfig: let 'shell' return enough output for deep path names
Linus Torvalds [Sun, 13 Feb 2022 18:06:40 +0000 (10:06 -0800)]
Merge tag 'irq-urgent-2022-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Interrupt chip driver fixes:
- Don't install an hotplug notifier for GICV3-ITS on systems which do
not need it to prevent a warning in the notifier about inconsistent
state
- Add the missing device tree matching for the T-HEAD PLIC variant so
the related SoC is properly supported"
* tag 'irq-urgent-2022-02-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Add missing thead,c900-plic match string
dt-bindings: update riscv plic compatible string
irqchip/gic-v3-its: Skip HP notifier when no ITS is registered
Linus Torvalds [Sun, 13 Feb 2022 17:43:34 +0000 (09:43 -0800)]
Merge tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
"Fix a case where objtool would mistakenly warn about instructions
being unreachable"
* tag 'objtool_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm
Linus Torvalds [Sun, 13 Feb 2022 17:22:52 +0000 (09:22 -0800)]
Merge tag 'x86_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Borislav Petkov:
"Prevent softlockups when tearing down large SGX enclaves"
* tag 'x86_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Silence softlockup detection when releasing large enclaves
Linus Torvalds [Sun, 13 Feb 2022 17:16:45 +0000 (09:16 -0800)]
Merge tag '5.17-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Three small smb3 reconnect fixes and an error log clarification"
* tag '5.17-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: mark sessions for reconnection in helper function
cifs: call helper functions for marking channels for reconnect
cifs: call cifs_reconnect when a connection is marked
[smb3] improve error message when mount options conflict with posix
Linus Torvalds [Sat, 12 Feb 2022 18:29:02 +0000 (10:29 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes in the lpfc driver. One changing the classification of
trace messages and the other fixing a build issue when NVME_FC is
disabled"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: lpfc: Reduce log messages seen after firmware download
scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
Linus Torvalds [Sat, 12 Feb 2022 18:01:55 +0000 (10:01 -0800)]
Merge tag 'tty-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are four small tty/serial fixes for 5.17-rc4. They are:
- 8250_pericom change revert to fix a reported regression
- two speculation fixes for vt_ioctl
- n_tty regression fix for polling
All of these have been in linux-next for a while with no reported
issues"
* tag 'tty-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
vt_ioctl: add array_index_nospec to VT_ACTIVATE
vt_ioctl: fix array_index_nospec in vt_setactivate
serial: 8250_pericom: Revert "Re-enable higher baud rates"
n_tty: wake up poll(POLLRDNORM) on receiving data
Linus Torvalds [Sat, 12 Feb 2022 17:56:18 +0000 (09:56 -0800)]
Merge tag 'usb-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for 5.17-rc4 that resolve some
reported issues and add new device ids:
- usb-serial new device ids
- ulpi cleanup fixes
- f_fs use-after-free fix
- dwc3 driver fixes
- ax88179_178a usb network driver fix
- usb gadget fixes
There is a revert at the end of this series to resolve a build problem
that 0-day found yesterday. Most of these have been in linux-next,
except for the last few, and all have now passed 0-day tests"
* tag 'usb-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "usb: dwc2: drd: fix soft connect when gadget is unconfigured"
usb: dwc2: drd: fix soft connect when gadget is unconfigured
usb: gadget: rndis: check size of RNDIS_MSG_SET command
USB: gadget: validate interface OS descriptor requests
usb: core: Unregister device on component_add() failure
net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
usb: dwc3: gadget: Prevent core from processing stale TRBs
USB: serial: cp210x: add CPI Bulk Coin Recycler id
USB: serial: cp210x: add NCR Retail IO box id
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
usb: gadget: f_uac2: Define specific wTerminalType
usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
usb: raw-gadget: fix handling of dual-direction-capable endpoints
usb: usb251xb: add boost-up property support
usb: ulpi: Call of_node_put correctly
usb: ulpi: Move of_node_put to ulpi_dev_release
USB: serial: option: add ZTE MF286D modem
USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
usb: f_fs: Fix use-after-free for epfile
usb: dwc3: xilinx: fix uninitialized return value
Linus Torvalds [Sat, 12 Feb 2022 17:12:44 +0000 (09:12 -0800)]
Merge tag 's390-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
"Maintainers and reviewers changes:
- Add Alexander Gordeev as maintainer for s390.
- Christian Borntraeger will focus on s390 KVM maintainership and
stays as s390 reviewer.
Fixes:
- Fix clang build of modules loader KUnit test.
- Fix kernel panic in CIO code on FCES path-event when no driver is
attached to a device or the driver does not provide the path_event
function"
* tag 's390-5.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cio: verify the driver availability for path_event call
s390/module: fix building test_modules_helpers.o with clang
MAINTAINERS: downgrade myself to Reviewer for s390
MAINTAINERS: add Alexander Gordeev as maintainer for s390
Linus Torvalds [Sat, 12 Feb 2022 17:08:57 +0000 (09:08 -0800)]
Merge tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- Two small cleanups
- Another fix for addressing the EFI framebuffer above 4GB when running
as Xen dom0
- A patch to let Xen guests use reserved bits in MSI- and IO-APIC-
registers for extended APIC-IDs the same way KVM guests are doing it
already
* tag 'for-linus-5.17a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/pci: Make use of the helper macro LIST_HEAD()
xen/x2apic: Fix inconsistent indenting
xen/x86: detect support for extended destination ID
xen/x86: obtain full video frame buffer address for Dom0 also under EFI
Linus Torvalds [Sat, 12 Feb 2022 17:04:05 +0000 (09:04 -0800)]
Merge tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This fixes a corner case of fatal SIGSYS being ignored since v5.15.
Along with the signal fix is a change to seccomp so that seeing
another syscall after a fatal filter result will cause seccomp to kill
the process harder.
Summary:
- Force HANDLER_EXIT even for SIGNAL_UNKILLABLE
- Make seccomp self-destruct after fatal filter results
- Update seccomp samples for easier behavioral demonstration"
* tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
samples/seccomp: Adjust sample to also provide kill option
seccomp: Invalidate seccomp mode to catch death failures
signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
Linus Torvalds [Sat, 12 Feb 2022 16:57:37 +0000 (08:57 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"5 patches.
Subsystems affected by this patch series: binfmt, procfs, and mm
(vmscan, memcg, and kfence)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kfence: make test case compatible with run time set sample interval
mm: memcg: synchronize objcg lists with a dedicated spinlock
mm: vmscan: remove deadlock due to throttling failing to make progress
fs/proc: task_mmu.c: don't read mapcount for migration entry
fs/binfmt_elf: fix PT_LOAD p_align values for loaders
Jing Leng [Fri, 11 Feb 2022 09:27:36 +0000 (17:27 +0800)]
kconfig: fix failing to generate auto.conf
When the KCONFIG_AUTOCONFIG is specified (e.g. export \
KCONFIG_AUTOCONFIG=output/config/auto.conf), the directory of
include/config/ will not be created, so kconfig can't create deps
files in it and auto.conf can't be generated.
Peng Liu [Sat, 12 Feb 2022 00:32:35 +0000 (16:32 -0800)]
kfence: make test case compatible with run time set sample interval
The parameter kfence_sample_interval can be set via boot parameter and
late shell command, which is convenient for automated tests and KFENCE
parameter optimization. However, KFENCE test case just uses
compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
case not run as users desired. Export kfence_sample_interval, so that
KFENCE test case can use run-time-set sample interval.
Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com Signed-off-by: Peng Liu <liupeng256@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Christian Knig <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Sat, 12 Feb 2022 00:32:32 +0000 (16:32 -0800)]
mm: memcg: synchronize objcg lists with a dedicated spinlock
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock: 00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
dump_stack_lvl+0x76/0x98
check_noncircular+0x136/0x158
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg. Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com Fixes: 22bf6f2deeca ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Roman Gushchin <guro@fb.com> Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This failure is specific to CONFIG_PREEMPT=n builds. The root of the
problem is that kcompact0 is not rescheduling on a CPU while a task that
has isolated a large number of the pages from the LRU is waiting on
kcompact0 to reschedule so the pages can be released. While
shrink_inactive_list() only loops once around too_many_isolated, reclaim
can continue without rescheduling if sc->skipped_deactivate == 1 which
could happen if there was no file LRU and the inactive anon list was not
low.
Link: https://lkml.kernel.org/r/20220203100326.GD3301@suse.de Fixes: 81c6ddde6c77 ("mm/vmscan: throttle reclaim and compaction when too may pages are isolated") Signed-off-by: Mel Gorman <mgorman@suse.de> Debugged-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Rik van Riel <riel@surriel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reproducer was trying to read /proc/$PID/smaps when calling
MADV_FREE at the mean time. MADV_FREE may split THPs if it is called
for partial THP. It may trigger the below race:
CPU A CPU B
----- -----
smaps walk: MADV_FREE:
page_mapcount()
PageCompound()
split_huge_page()
page = compound_head(page)
PageDoubleMap(page)
When calling PageDoubleMap() this page is not a tail page of THP anymore
so the BUG is triggered.
This could be fixed by elevated refcount of the page before calling
mapcount, but that would prevent it from counting migration entries, and
it seems overkilling because the race just could happen when PMD is
split so all PTE entries of tail pages are actually migration entries,
and smaps_account() does treat migration entries as mapcount == 1 as
Kirill pointed out.
Add a new parameter for smaps_account() to tell this entry is migration
entry then skip calling page_mapcount(). Don't skip getting mapcount
for device private entries since they do track references with mapcount.
Pagemap also has the similar issue although it was not reported. Fixed
it as well.
[shy828301@gmail.com: v4] Link: https://lkml.kernel.org/r/20220203182641.824731-1-shy828301@gmail.com
[nathan@kernel.org: avoid unused variable warning in pagemap_pmd_range()] Link: https://lkml.kernel.org/r/20220207171049.1102239-1-nathan@kernel.org Link: https://lkml.kernel.org/r/20220120202805.3369-1-shy828301@gmail.com Fixes: 5c008b5b928f ("thp: reintroduce split_huge_page()") Signed-off-by: Yang Shi <shy828301@gmail.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reported-by: syzbot+1f52b3a18d5633fa7f82@syzkaller.appspotmail.com Acked-by: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Jann Horn <jannh@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Sat, 12 Feb 2022 00:32:22 +0000 (16:32 -0800)]
fs/binfmt_elf: fix PT_LOAD p_align values for loaders
Rui Salvaterra reported that Aisleroit solitaire crashes with "Wrong
__data_start/_end pair" assertion from libgc after update to v5.17-rc1.
Bisection pointed to commit ce42c79a5dcd ("fs/binfmt_elf: use PT_LOAD
p_align values for static PIE") that fixed handling of static PIEs, but
made the condition that guards load_bias calculation to exclude loader
binaries.
Restoring the check for presence of interpreter fixes the problem.
Link: https://lkml.kernel.org/r/20220202121433.3697146-1-rppt@kernel.org Fixes: ce42c79a5dcd ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Tested-by: Rui Salvaterra <rsalvaterra@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H.J. Lu" <hjl.tools@gmail.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 11 Feb 2022 21:40:03 +0000 (13:40 -0800)]
Merge tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"This is a fairly large set of bugfixes, most of which had been sent a
while ago but only now made it into the soc tree:
Maintainer file updates:
- Claudiu Beznea now co-maintains the at91 soc family, replacing
Ludovic Desroches.
- Michael Walle maintains the sl28cpld drivers
- Alain Volmat and Raphael Gallais-Pou take over some drivers for ST
platforms
- Alim Akhtar is an additional reviewer for Samsung platforms
Code fixes:
- Op-tee had a problem with object lifetime that needs a slightly
complex fix, as well as another bug with error handling.
- Several minor issues for the OMAP platform, including a regression
with the timer
- A Kconfig change to fix a build-time issue on Intel SoCFPGA
Device tree fixes:
- The Amlogic Meson platform fixes a boot regression on am1-odroid, a
spurious interrupt, and a problem with reserved memory regions
- In the i.MX platform, several bug fixes are needed to make devices
work correctly: SD card detection, alarmtimer, and sound card on
some board. One patch for the GPU got in there by accident and gets
reverted again.
- TI K3 needs a fix for J721S2 serial port numbers
- ux500 needs a fix to mount the SD card as root on the Skomer phone"
* tag 'soc-fixes-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (46 commits)
Revert "arm64: dts: imx8mn-venice-gw7902: disable gpu"
arm64: Remove ARCH_VULCAN
MAINTAINERS: add myself as a maintainer for the sl28cpld
MAINTAINERS: add IRC to ARM sub-architectures and Devicetree
MAINTAINERS: arm: samsung: add Git tree and IRC
ARM: dts: Fix boot regression on Skomer
ARM: dts: spear320: Drop unused and undocumented 'irq-over-gpio' property
soc: aspeed: lpc-ctrl: Block error printing on probe defer cases
docs/ABI: testing: aspeed-uart-routing: Escape asterisk
MAINTAINERS: update drm/stm drm/sti and cec/sti maintainers
MAINTAINERS: Update Benjamin Gaignard maintainer status
ARM: socfpga: fix missing RESET_CONTROLLER
arm64: dts: meson-sm1-odroid: fix boot loop after reboot
arm64: dts: meson-g12: drop BL32 region from SEI510/SEI610
arm64: dts: meson-g12: add ATF BL32 reserved-memory region
arm64: dts: meson-gx: add ATF BL32 reserved-memory region
arm64: dts: meson-sm1-bananapi-m5: fix wrong GPIO domain for GPIOE_2
arm64: dts: meson-sm1-odroid: use correct enable-gpio pin for tf-io regulator
arm64: dts: meson-g12b-odroid-n2: fix typo 'dio2133'
optee: use driver internal tee_context for some rpc
...
Linus Torvalds [Fri, 11 Feb 2022 20:55:17 +0000 (12:55 -0800)]
Merge tag 'pci-v5.17-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert a commit that reduced the number of IRQs used but resulted in
interrupt storms (Bjorn Helgaas)"
* tag 'pci-v5.17-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI/portdrv: Do not setup up IRQs if there are no users"
ac1fdc128b49 ("PCI/portdrv: Do not setup up IRQs if there are no users")
reduced usage of IRQs when we don't think we need them. But Joey, Sergiu,
and David reported choppy GUI rendering, systems that became unresponsive
every few seconds, incorrect values reported by cpufreq, and high IRQ 16
CPU usage.
Joey bisected the issues to ac1fdc128b49, so revert it until we figure out
a better solution.
Linus Torvalds [Fri, 11 Feb 2022 20:02:09 +0000 (12:02 -0800)]
Merge tag 'riscv-for-linus-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- A fix to avoid undefined behavior when stack backtracing, which
manifests in GCC as incorrect stack addresses
- A few fixes for the XIP kernels
- A fix to tracking NUMA state on CPU hotplug
- Support for the recently relesaed binutils-2.38, which changed the
default ISA version to one without CSRs or fence.i in 'I' extension
* tag 'riscv-for-linus-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: fix build with binutils 2.38
riscv: cpu-hotplug: clear cpu from numa map when teardown
riscv: extable: fix err reg writing in dedicated uaccess handler
riscv/mm: Add XIP_FIXUP for riscv_pfn_base
riscv/mm: Add XIP_FIXUP for phys_ram_base
riscv: Fix XIP_FIXUP_FLASH_OFFSET
riscv: eliminate unreliable __builtin_frame_address(1)
Linus Torvalds [Fri, 11 Feb 2022 19:55:26 +0000 (11:55 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Enable Cortex-A510 erratum 2051678 by default as we do with other
errata.
- arm64 IORT: Check the node revision for PMCG resources to cope with
old firmware based on a broken revision of the spec that had no way
to describe the second register page (when an implementation is using
the recommended RELOC_CTRS feature).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ACPI/IORT: Check node revision for PMCG resources
arm64: Enable Cortex-A510 erratum 2051678 by default
Linus Torvalds [Fri, 11 Feb 2022 19:48:13 +0000 (11:48 -0800)]
Merge tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert two commits that turned out to be problematic and fix two
issues related to wakeup from suspend-to-idle on x86.
Specifics:
- Revert a recent change that attempted to avoid issues with
conflicting address ranges during PCI initialization, because it
turned out to introduce a regression (Hans de Goede).
- Revert a change that limited EC GPE wakeups from suspend-to-idle to
systems based on Intel hardware, because it turned out that systems
based on hardware from other vendors depended on that functionality
too (Mario Limonciello).
- Fix two issues related to the handling of wakeup interrupts and
wakeup events signaled through the EC GPE during suspend-to-idle on
x86 (Rafael Wysocki)"
* tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
PM: s2idle: ACPI: Fix wakeup interrupts handling
ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
Linus Torvalds [Fri, 11 Feb 2022 19:36:32 +0000 (11:36 -0800)]
Merge tag 'gfs2-v5.16-rc3-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
- Revert debug commit that causes unexpected data corruption
- Fix muti-block reservation regression
* tag 'gfs2-v5.16-rc3-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Fix gfs2_release for non-writers regression
Revert "gfs2: check context in gfs2_glock_put"
Linus Torvalds [Fri, 11 Feb 2022 19:26:07 +0000 (11:26 -0800)]
Merge tag 'block-5.17-2022-02-11' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request
- nvme-tcp: fix bogus request completion when failing to send AER
(Sagi Grimberg)
- add the missing nvme_complete_req tracepoint for batched
completion (Bean Huo)
- Revert of the loop async autoclear issue that has continued to plague
us this release. A few patchsets exists to improve this, but they are
too invasive to be considered at this point (Tetsuo)
* tag 'block-5.17-2022-02-11' of git://git.kernel.dk/linux-block:
loop: revert "make autoclear operation asynchronous"
nvme-tcp: fix bogus request completion when failing to send AER
nvme: add nvme_complete_req tracepoint for batched completion
Linus Torvalds [Fri, 11 Feb 2022 19:18:42 +0000 (11:18 -0800)]
Merge tag 'io_uring-5.17-2022-02-11' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- Fix a false-positive warning from an older gcc (Alviro)
- Allow oom killer invocations from io_uring_setup (Shakeel)
* tag 'io_uring-5.17-2022-02-11' of git://git.kernel.dk/linux-block:
mm: io_uring: allow oom-killer from io_uring_setup
io_uring: Clean up a false-positive warning from GCC 9.3.0
Linus Torvalds [Fri, 11 Feb 2022 19:05:49 +0000 (11:05 -0800)]
Merge tag 'gpio-fixes-for-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- use sleeping variants of GPIO accessors where needed
in gpio-aggregator
- never return kernel's internal error codes to user-space
in gpiolib core
- use the correct register for reading output values in
gpio-sifive
- fix line hogging in gpio-sim
* tag 'gpio-fixes-for-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: fix hogs with custom chip labels
gpio: sifive: use the correct register to read output values
gpiolib: Never return internal error codes to user space
gpio: aggregator: Fix calling into sleeping GPIO controllers
Linus Torvalds [Fri, 11 Feb 2022 18:42:31 +0000 (10:42 -0800)]
Merge tag 'ata-5.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata fixes from Damien Le Moal:
"A couple of additional fixes for 5.17-rc4:
- Fix compilation warnings in the sata_fsl driver (powerpc) (me)
- Disable TRIM commands on M88V29 devices as these commands are
failing despite the device reporting it supports TRIM (Zoltan)"
* tag 'ata-5.17-rc4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-core: Disable TRIM on M88V29
ata: sata_fsl: fix sscanf() and sysfs_emit() format strings
* tag 'drm-fixes-2022-02-11' of git://anongit.freedesktop.org/drm/drm: (25 commits)
drm/amdgpu/display: change pipe policy for DCN 2.0
drm/amd/pm: fix hwmon node of power1_label create issue
drm/amd/display: keep eDP Vdd on when eDP stream is already enabled
drm/amd/display: fix yellow carp wm clamping
drm/amd/display: Cap pflip irqs per max otg number
drm/amdgpu: add utcl2_harvest to gc 10.3.1
display/amd: decrease message verbosity about watermarks table failure
drm/rockchip: vop: Correct RK3399 VOP register fields
drm/rockchip: dw_hdmi: Do not leave clock enabled in error case
MAINTAINERS: Add entry for fbdev core
fbcon: Avoid 'cap' set but not used warning
drm/privacy-screen: Fix sphinx warning
drm/i915: Workaround broken BIOS DBUF configuration on TGL/RKL
drm/i915: Populate pipe dbuf slices more accurately during readout
drm/i915: Allow !join_mbus cases for adlp+ dbuf configuration
drm/i915: Fix header test for !CONFIG_X86
drm/i915/ttm: Return some errors instead of trying memcpy move
drm/i915: Disable DRRS on IVB/HSW port != A
drm/i915: Fix oops due to missing stack depot
drm/vc4: crtc: Fix redundant variable assignment
...
Bob Peterson [Tue, 18 Jan 2022 14:30:18 +0000 (09:30 -0500)]
gfs2: Fix gfs2_release for non-writers regression
When a file is opened for writing, the vfs code (do_dentry_open)
calls get_write_access for the inode, thus incrementing the inode's write
count. That writer normally then creates a multi-block reservation for
the inode (i_res) that can be re-used by other writers, which speeds up
writes for applications that stupidly loop on open/write/close.
When the writes are all done, the multi-block reservation should be
deleted when the file is closed by the last "writer."
Commit 693c28f1fd48 broke that concept when it moved the call to
gfs2_rs_delete before the check for FMODE_WRITE. Non-writers have no
business removing the multi-block reservations of writers. In fact, if
someone opens and closes the file for RO while a writer has a
multi-block reservation, the RO closer will delete the reservation
midway through the write, and this results in:
kernel BUG at fs/gfs2/rgrp.c:677! (or thereabouts) which is:
BUG_ON(rs->rs_requested); from function gfs2_rs_deltree.
This patch moves the check back inside the check for FMODE_WRITE.
Fixes: 693c28f1fd48 ("gfs2: Check for active reservation in gfs2_release") Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
It turns out that the might_sleep() call that commit b7af757f5333 adds
is triggering occasional data corruption in testing. We're not sure
about the root cause yet, but since this commit was added as a debugging
aid only, revert it for now.
The kernel test robot is reporting that xfstest which does
umount ext2 on xfs
umount xfs
sequence started failing, for commit 79abac9e9a08c9d4 ("loop: make
autoclear operation asynchronous") removed a guarantee that fput() of
backing file is processed before lo_release() from close() returns to
user mode.
And syzbot is reporting that deferring destroy_workqueue() from
__loop_clr_fd() to a WQ context did not help [1]. Revert that commit.
Mathias Krause [Mon, 7 Feb 2022 15:01:19 +0000 (16:01 +0100)]
iio: buffer: Fix file related error handling in IIO_BUFFER_GET_FD_IOCTL
If we fail to copy the just created file descriptor to userland, we
try to clean up by putting back 'fd' and freeing 'ib'. The code uses
put_unused_fd() for the former which is wrong, as the file descriptor
was already published by fd_install() which gets called internally by
anon_inode_getfd().
This makes the error handling code leaving a half cleaned up file
descriptor table around and a partially destructed 'file' object,
allowing userland to play use-after-free tricks on us, by abusing
the still usable fd and making the code operate on a dangling
'file->private_data' pointer.
Instead of leaving the kernel in a partially corrupted state, don't
attempt to explicitly clean up and leave this to the process exit
path that'll release any still valid fds, including the one created
by the previous call to anon_inode_getfd(). Simply return -EFAULT to
indicate the error.
Fixes: eee7fa8d8695 ("iio: buffer: add ioctl() to support opening extra buffers for IIO device") Cc: stable@kernel.org Cc: Jonathan Cameron <jic23@kernel.org> Cc: Alexandru Ardelean <ardeleanalex@gmail.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Nuno Sa <Nuno.Sa@analog.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fabrice Gasnier [Wed, 9 Feb 2022 16:15:53 +0000 (17:15 +0100)]
usb: dwc2: drd: fix soft connect when gadget is unconfigured
When the gadget driver hasn't been (yet) configured, and the cable is
connected to a HOST, the SFTDISCON gets cleared unconditionally, so the
HOST tries to enumerate it.
At the host side, this can result in a stuck USB port or worse. When
getting lucky, some dmesg can be observed at the host side:
new high-speed USB device number ...
device descriptor read/64, error -110
Fix it in drd, by checking the enabled flag before calling
dwc2_hsotg_core_connect(). It will be called later, once configured,
by the normal flow:
- udc_bind_to_driver
- usb_gadget_connect
- dwc2_hsotg_pullup
- dwc2_hsotg_core_connect
usb: core: Unregister device on component_add() failure
Commit 1253a43d05cd ("usb: Link the ports to the connectors they are
attached to") creates a link to the USB Type-C connector for every new
port that is added when possible. If component_add() fails,
usb_hub_create_port_device() prints a warning but does not unregister
the device and does not return errors to the callers.
Syzbot reported a "WARNING in component_del()".
Fix this issue in usb_hub_create_port_device by calling device_unregister()
and returning the errors from component_add().
Fixes: 1253a43d05cd ("usb: Link the ports to the connectors they are attached to") Reported-and-tested-by: syzbot+60df062e1c41940cae0f@syzkaller.appspotmail.com Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220209164500.8769-1-fmdefrancesco@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jann Horn [Wed, 26 Jan 2022 13:14:52 +0000 (14:14 +0100)]
net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
ax88179_rx_fixup() contains several out-of-bounds accesses that can be
triggered by a malicious (or defective) USB device, in particular:
- The metadata array (hdr_off..hdr_off+2*pkt_cnt) can be out of bounds,
causing OOB reads and (on big-endian systems) OOB endianness flips.
- A packet can overlap the metadata array, causing a later OOB
endianness flip to corrupt data used by a cloned SKB that has already
been handed off into the network stack.
- A packet SKB can be constructed whose tail is far beyond its end,
causing out-of-bounds heap data to be considered part of the SKB's
data.
I have tested that this can be used by a malicious USB device to send a
bogus ICMPv6 Echo Request and receive an ICMPv6 Echo Reply in response
that contains random kernel heap data.
It's probably also possible to get OOB writes from this on a
little-endian system somehow - maybe by triggering skb_cow() via IP
options processing -, but I haven't tested that.
Kees Cook [Tue, 8 Feb 2022 04:21:13 +0000 (20:21 -0800)]
seccomp: Invalidate seccomp mode to catch death failures
If seccomp tries to kill a process, it should never see that process
again. To enforce this proactively, switch the mode to something
impossible. If encountered: WARN, reject all syscalls, and attempt to
kill the process again even harder.
Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Fixes: 29c23e51a493 ("seccomp: remove 2-phase API") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
Kees Cook [Tue, 8 Feb 2022 08:57:17 +0000 (00:57 -0800)]
signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
Fatal SIGSYS signals (i.e. seccomp RET_KILL_* syscall filter actions)
were not being delivered to ptraced pid namespace init processes. Make
sure the SIGNAL_UNKILLABLE doesn't get set for these cases.
Reported-by: Robert Święcki <robert@swiecki.net> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com> Fixes: a2a2dd8f85de ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Link: https://lore.kernel.org/lkml/878rui8u4a.fsf@email.froward.int.ebiederm.org
Dave Airlie [Fri, 11 Feb 2022 02:32:45 +0000 (12:32 +1000)]
Merge tag 'drm-intel-fixes-2022-02-10' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Build fix for non-x86 platforms after remap_io_mmapping changes. (Lucas De Marchi)
- Correctly propagate errors during object migration blits. (Thomas Hellström)
- Disable DRRS support on HSW/IVB where it is not implemented yet. (Ville Syrjälä)
- Correct pipe dbuf BIOS configuration during readout. (Ville Syrjälä)
- Properly sanitise BIOS buf configuration on ADL-P+ for !join_mbus cases. (Ville Syrjälä)
- Fix oops due to missing stack depot. (Ville Syrjälä)
- Workaround broken BIOS DBUF configuration on TGL/RKL. (Ville Syrjälä)
Linus Torvalds [Fri, 11 Feb 2022 00:01:22 +0000 (16:01 -0800)]
Merge tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter and can.
Current release - new code bugs:
- sparx5: fix get_stat64 out-of-bound access and crash
- smc: fix netdev ref tracker misuse
Previous releases - regressions:
- eth: ixgbevf: require large buffers for build_skb on 82599VF, avoid
overflows
- eth: ocelot: fix all IP traffic getting trapped to CPU with PTP
over IP
- bonding: fix rare link activation misses in 802.3ad mode
Previous releases - always broken:
- tcp: fix tcp sock mem accounting in zero-copy corner cases
- remove the cached dst when uncloning an skb dst and its metadata,
since we only have one ref it'd lead to an UaF
- netfilter:
- conntrack: don't refresh sctp entries in closed state
- conntrack: re-init state for retransmitted syn-ack, avoid
connection establishment getting stuck with strange stacks
- ctnetlink: disable helper autoassign, avoid it getting lost
- nft_payload: don't allow transport header access for fragments
- dsa: fix use of devres for mdio throughout drivers
- eth: amd-xgbe: disable interrupts during pci removal
- eth: dpaa2-eth: unregister netdev before disconnecting the PHY
- eth: ice: fix IPIP and SIT TSO offload"
* tag 'net-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
net: mscc: ocelot: fix mutex lock error during ethtool stats read
ice: Avoid RTNL lock when re-creating auxiliary device
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
ice: fix IPIP and SIT TSO offload
ice: fix an error code in ice_cfg_phy_fec()
net: mpls: Fix GCC 12 warning
dpaa2-eth: unregister the netdev before disconnecting from the PHY
skbuff: cleanup double word in comment
net: macb: Align the dma and coherent dma masks
mptcp: netlink: process IPv6 addrs in creating listening sockets
selftests: mptcp: add missing join check
net: usb: qmi_wwan: Add support for Dell DW5829e
vlan: move dev_put into vlan_dev_uninit
vlan: introduce vlan_dev_free_egress_priority
ax25: fix UAF bugs of net_device caused by rebinding operation
net: dsa: fix panic when DSA master device unbinds on shutdown
net: amd-xgbe: disable interrupts during pci removal
tipc: rate limit warning for received illegal binding update
net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
...
Reinette Chatre [Tue, 8 Feb 2022 18:48:07 +0000 (10:48 -0800)]
x86/sgx: Silence softlockup detection when releasing large enclaves
Vijay reported that the "unclobbered_vdso_oversubscribed" selftest
triggers the softlockup detector.
Actual SGX systems have 128GB of enclave memory or more. The
"unclobbered_vdso_oversubscribed" selftest creates one enclave which
consumes all of the enclave memory on the system. Tearing down such a
large enclave takes around a minute, most of it in the loop where
the EREMOVE instruction is applied to each individual 4k enclave page.
Spending one minute in a loop triggers the softlockup detector.
Add a cond_resched() to give other tasks a chance to run and placate
the softlockup detector.
Cc: stable@vger.kernel.org Fixes: 77dac496e929 ("x86/sgx: Add a page reclaimer") Reported-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> (kselftest as sanity check) Link: https://lkml.kernel.org/r/ced01cac1e75f900251b0a4ae1150aa8ebd295ec.1644345232.git.reinette.chatre@intel.com
Linus Torvalds [Thu, 10 Feb 2022 23:42:48 +0000 (15:42 -0800)]
Merge tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Build and run-time fixes to pidfd, clone3, and ir tests"
* tag 'linux-kselftest-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ir: fix build with ancient kernel headers
selftests: fixup build warnings in pidfd / clone3 tests
pidfd: fix test failure due to stack overflow on some arches
Linus Torvalds [Thu, 10 Feb 2022 23:39:59 +0000 (15:39 -0800)]
Merge tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"Fixes to the test and usage documentation"
* tag 'linux-kselftest-kunit-fixes-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: KUnit: Fix usage bug
kunit: fix missing f in f-string in run_checks.py
Vladimir Oltean [Thu, 10 Feb 2022 17:40:17 +0000 (19:40 +0200)]
net: dsa: mv88e6xxx: fix use-after-free in mv88e6xxx_mdios_unregister
Since struct mv88e6xxx_mdio_bus *mdio_bus is the bus->priv of something
allocated with mdiobus_alloc_size(), this means that mdiobus_free(bus)
will free the memory backing the mdio_bus as well. Therefore, the
mdio_bus->list element is freed memory, but we continue to iterate
through the list of MDIO buses using that list element.
To fix this, use the proper list iterator that handles element deletion
by keeping a copy of the list element next pointer.
Fixes: ee83a90e4a43 ("net: dsa: mv88e6xxx: don't use devres for mdiobus") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220210174017.3271099-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Thu, 10 Feb 2022 19:45:35 +0000 (11:45 -0800)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2022-02-10
Dan Carpenter propagates an error in FEC configuration.
Jesse fixes TSO offloads of IPIP and SIT frames.
Dave adds a dedicated LAG unregister function to resolve a KASAN error
and moves auxiliary device re-creation after LAG removal to the service
task to avoid issues with RTNL lock.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
ice: Avoid RTNL lock when re-creating auxiliary device
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
ice: fix IPIP and SIT TSO offload
ice: fix an error code in ice_cfg_phy_fec()
====================
An ongoing workqueue populates the stats buffer. At the same time, a user
might query the statistics. While writing to the buffer is mutex-locked,
reading from the buffer wasn't. This could lead to buggy reads by ethtool.
This patch fixes the former blamed commit, but the bug was introduced in
the latter.
Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Fixes: a3069a22337ad ("ocelot: Clean up stats update deferred work") Fixes: 99c1e05296dda ("net: mscc: Add initial Ocelot switch support") Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/all/20220210150451.416845-2-colin.foster@in-advantage.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Aurelien Jarno [Wed, 26 Jan 2022 17:14:42 +0000 (18:14 +0100)]
riscv: fix build with binutils 2.38
From version 2.38, binutils default to ISA spec version 20191213. This
means that the csr read/write (csrr*/csrw*) instructions and fence.i
instruction has separated from the `I` extension, become two standalone
extensions: Zicsr and Zifencei. As the kernel uses those instruction,
this causes the following build failure:
Pingfan Liu [Sun, 23 Jan 2022 12:13:52 +0000 (20:13 +0800)]
riscv: cpu-hotplug: clear cpu from numa map when teardown
There is numa_add_cpu() when cpus online, accordingly, there should be
numa_remove_cpu() when cpus offline.
Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Fixes: 6acc2c7ec784 ("riscv: Add numa support for riscv64 platform") Cc: stable@vger.kernel.org
[Palmer: Add missing NUMA include] Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Dave Ertman [Fri, 21 Jan 2022 00:27:56 +0000 (16:27 -0800)]
ice: Avoid RTNL lock when re-creating auxiliary device
If a call to re-create the auxiliary device happens in a context that has
already taken the RTNL lock, then the call flow that recreates auxiliary
device can hang if there is another attempt to claim the RTNL lock by the
auxiliary driver.
To avoid this, any call to re-create auxiliary devices that comes from
an source that is holding the RTNL lock (e.g. netdev notifier when
interface exits a bond) should execute in a separate thread. To
accomplish this, add a flag to the PF that will be evaluated in the
service task and dealt with there.
Fixes: dfd6ba335c7f ("ice: Register auxiliary device to provide RDMA") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dave Ertman [Tue, 18 Jan 2022 21:08:20 +0000 (13:08 -0800)]
ice: Fix KASAN error in LAG NETDEV_UNREGISTER handler
Currently, the same handler is called for both a NETDEV_BONDING_INFO
LAG unlink notification as for a NETDEV_UNREGISTER call. This is
causing a problem though, since the netdev_notifier_info passed has
a different structure depending on which event is passed. The problem
manifests as a call trace from a BUG: KASAN stack-out-of-bounds error.
Fix this by creating a handler specific to NETDEV_UNREGISTER that only
is passed valid elements in the netdev_notifier_info struct for the
NETDEV_UNREGISTER event.
Also included is the removal of an unbalanced dev_put on the peer_netdev
and related braces.
Fixes: 97baf07c647a ("ice: Respond to a NETDEV_UNREGISTER event for LAG") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jesse Brandeburg [Fri, 14 Jan 2022 23:38:39 +0000 (15:38 -0800)]
ice: fix IPIP and SIT TSO offload
The driver was avoiding offload for IPIP (at least) frames due to
parsing the inner header offsets incorrectly when trying to check
lengths.
This length check works for VXLAN frames but fails on IPIP frames
because skb_transport_offset points to the inner header in IPIP
frames, which meant the subtraction of transport_header from
inner_network_header returns a negative value (-20).
With the code before this patch, everything continued to work, but GSO
was being used to segment, causing throughputs of 1.5Gb/s per thread.
After this patch, throughput is more like 10Gb/s per thread for IPIP
traffic.
Fixes: c0973e6db019 ("ice: Implement filter sync, NDO operations and bump version") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Dan Carpenter [Fri, 7 Jan 2022 08:02:06 +0000 (11:02 +0300)]
ice: fix an error code in ice_cfg_phy_fec()
Propagate the error code from ice_get_link_default_override() instead
of returning success.
Fixes: 179881008856 ("ice: add link lenient and default override support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Victor Erminpour [Thu, 10 Feb 2022 00:28:38 +0000 (16:28 -0800)]
net: mpls: Fix GCC 12 warning
When building with automatic stack variable initialization, GCC 12
complains about variables defined outside of switch case statements.
Move the variable outside the switch, which silences the warning:
./net/mpls/af_mpls.c:1624:21: error: statement will never be executed [-Werror=switch-unreachable]
1624 | int err;
| ^~~
Signed-off-by: Victor Erminpour <victor.erminpour@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-eth: unregister the netdev before disconnecting from the PHY
The netdev should be unregistered before we are disconnecting from the
MAC/PHY so that the dev_close callback is called and the PHY and the
phylink workqueues are actually stopped before we are disconnecting and
destroying the phylink instance.
Fixes: 7579803768fc ("dpaa2-eth: add MAC/PHY support through phylink") Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marc St-Amand [Wed, 9 Feb 2022 09:43:25 +0000 (15:13 +0530)]
net: macb: Align the dma and coherent dma masks
Single page and coherent memory blocks can use different DMA masks
when the macb accesses physical memory directly. The kernel is clever
enough to allocate pages that fit into the requested address width.
When using the ARM SMMU, the DMA mask must be the same for single
pages and big coherent memory blocks. Otherwise the translation
tables turn into one big mess.
[ 74.959909] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
[ 74.959989] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1
[ 75.173939] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
[ 75.173955] arm-smmu fd800000.smmu: Unhandled context fault: fsr=0x402, iova=0x3165687460, fsynr=0x20001, cbfrsynra=0x877, cb=1
Since using the same DMA mask does not hurt direct 1:1 physical
memory mappings, this commit always aligns DMA and coherent masks.
Signed-off-by: Marc St-Amand <mstamand@ciena.com> Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jens Axboe [Thu, 10 Feb 2022 13:56:43 +0000 (06:56 -0700)]
Merge tag 'nvme-5.17-2022-02-10' of git://git.infradead.org/nvme into block-5.17
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.17
- nvme-tcp: fix bogus request completion when failing to send AER
(Sagi Grimberg)
- add the missing nvme_complete_req tracepoint for batched completion
(Bean Huo)"
* tag 'nvme-5.17-2022-02-10' of git://git.infradead.org/nvme:
nvme-tcp: fix bogus request completion when failing to send AER
nvme: add nvme_complete_req tracepoint for batched completion
Linus Torvalds [Thu, 10 Feb 2022 13:43:43 +0000 (05:43 -0800)]
Merge tag 'audit-pr-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"Another audit fix, this time a single rather small but important fix
for an oops/page-fault caused by improperly accessing userspace
memory"
* tag 'audit-pr-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: don't deref the syscall args when checking the openat2 open_how::flags
Jon Maloy [Sat, 5 Feb 2022 19:11:18 +0000 (14:11 -0500)]
tipc: improve size validations for received domain records
The function tipc_mon_rcv() allows a node to receive and process
domain_record structs from peer nodes to track their views of the
network topology.
This patch verifies that the number of members in a received domain
record does not exceed the limit defined by MAX_MON_DOMAIN, something
that may otherwise lead to a stack overflow.
tipc_mon_rcv() is called from the function tipc_link_proto_rcv(), where
we are reading a 32 bit message data length field into a uint16. To
avert any risk of bit overflow, we add an extra sanity check for this in
that function. We cannot see that happen with the current code, but
future designers being unaware of this risk, may introduce it by
allowing delivery of very large (> 64k) sk buffers from the bearer
layer. This potential problem was identified by Eric Dumazet.
This fixes CVE-2022-0435
Reported-by: Samuel Page <samuel.page@appgate.com> Reported-by: Eric Dumazet <edumazet@google.com> Fixes: edaafc288dc9 ("tipc: add neighbor monitoring framework") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Samuel Page <samuel.page@appgate.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roger Pau Monne [Thu, 20 Jan 2022 15:25:27 +0000 (16:25 +0100)]
xen/x86: detect support for extended destination ID
Xen allows the usage of some previously reserved bits in the IO-APIC
RTE and the MSI address fields in order to store high bits for the
target APIC ID. Such feature is already implemented by QEMU/KVM and
HyperV, so in order to enable it just add the handler that checks for
it's presence.
Jan Beulich [Mon, 7 Feb 2022 07:41:03 +0000 (08:41 +0100)]
xen/x86: obtain full video frame buffer address for Dom0 also under EFI
The initial change would not work when Xen was booted from EFI: There is
an early exit from the case block in that case. Move the necessary code
ahead of that.
Fixes: fa779ff427bc ("xen/x86: obtain upper 32 bits of video frame buffer address for Dom0") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/2501ce9d-40e5-b49d-b0e5-435544d17d4a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
Matthieu Baerts [Thu, 10 Feb 2022 01:25:07 +0000 (17:25 -0800)]
selftests: mptcp: add missing join check
This function also writes the name of the test with its ID, making clear
a new test has been executed.
Without that, the ADD_ADDR results from this test was appended at the
end of the previous test causing confusions. Especially when the second
test was failing, we had:
17 signal invalid addresses syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
add[fail] got 2 ADD_ADDR[s] expected 3
In fact, this 17th test was OK but not the 18th one.
Now we have:
17 signal invalid addresses syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ]
18 signal addresses race test syn[fail] got 2 JOIN[s] syn expected 3
- synack[fail] got 2 JOIN[s] synack expected
- ack[fail] got 2 JOIN[s] ack expected 3
add[fail] got 2 ADD_ADDR[s] expected 3
Fixes: 22edb37d483d ("selftests: mptcp: add_addr and echo race test") Reported-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Slark Xiao [Wed, 9 Feb 2022 02:47:17 +0000 (10:47 +0800)]
net: usb: qmi_wwan: Add support for Dell DW5829e
Dell DW5829e same as DW5821e except the CAT level.
DW5821e supports CAT16 but DW5829e supports CAT9.
Also, DW5829e includes normal and eSIM type.
Please see below test evidence: