This is a check to ensure that the extents have been read into
memory before we are doing a ifork btree manipulation. This assert
is bogus in the above case.
We have a fragmented directory block that has more extents in it
than can fit in extent format, so the inode data fork is in btree
format. xfs_dir2_shrink_inode() asks to remove all remaining 16
filesystem blocks from the inode so it can convert to short form,
and __xfs_bunmapi() removes all the extents. We now have a data fork
in btree format but have zero extents in the fork. This incorrectly
trips the xfs_need_iread_extents() assert because it assumes that an
empty extent btree means the extent tree has not been read into
memory yet. This is clearly not the case with xfs_bunmapi(), as it
has an explicit call to xfs_iread_extents() in it to pull the
extents into memory before it starts unmapping.
Also, the assert directly after this bogus one is:
ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE);
Which covers the context in which it is legal to call
xfs_bmap_btree_to_extents just fine. Hence we should just remove the
bogus assert as it is clearly wrong and causes a regression.
The returns the test behaviour to the pre-existing assert failure in
xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to
remove all the extents in the range it was asked to unmap.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Darrick J. Wong [Wed, 12 May 2021 19:51:26 +0000 (12:51 -0700)]
xfs: validate extsz hints against rt extent size when rtinherit is set
The RTINHERIT bit can be set on a directory so that newly created
regular files will have the REALTIME bit set to store their data on the
realtime volume. If an extent size hint (and EXTSZINHERIT) are set on
the directory, the hint will also be copied into the new file.
As pointed out in previous patches, for realtime files we require the
extent size hint be an integer multiple of the realtime extent, but we
don't perform the same validation on a directory with both RTINHERIT and
EXTSZINHERIT set, even though the only use-case of that combination is
to propagate extent size hints into new realtime files. This leads to
inode corruption errors when the bad values are propagated.
Because there may be existing filesystems with such a configuration, we
cannot simply amend the inode verifier to trip on these directories and
call it a day because that will cause previously "working" filesystems
to start throwing errors abruptly. Note that it's valid to have
directories with rtinherit set even if there is no realtime volume, in
which case the problem does not manifest because rtinherit is ignored if
there's no realtime device; and it's possible that someone set the flag,
crashed, repaired the filesystem (which clears the hint on the realtime
file) and continued.
Therefore, mitigate this issue in several ways: First, if we try to
write out an inode with both rtinherit/extszinherit set and an unaligned
extent size hint, turn off the hint to correct the error. Second, if
someone tries to misconfigure a directory via the fssetxattr ioctl, fail
the ioctl. Third, reverify both extent size hint values when we
propagate heritable inode attributes from parent to child, to prevent
misconfigurations from spreading.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Wed, 12 May 2021 19:49:19 +0000 (12:49 -0700)]
xfs: standardize extent size hint validation
While chasing a bug involving invalid extent size hints being propagated
into newly created realtime files, I noticed that the xfs_ioctl_setattr
checks for the extent size hints weren't the same as the ones now
encoded in libxfs and used for validation in repair and mkfs.
Because the checks in libxfs are more stringent than the ones in the
ioctl, it's possible for a live system to set inode flags that
immediately result in corruption warnings. Specifically, it's possible
to set an extent size hint on an rtinherit directory without checking if
the hint is aligned to the realtime extent size, which makes no sense
since that combination is used only to seed new realtime files.
Replace the open-coded and inadequate checks with the libxfs verifier
versions and update the code comments a bit.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Fri, 21 May 2021 00:15:49 +0000 (17:15 -0700)]
xfs: check free AG space when making per-AG reservations
The new online shrink code exposed a gap in the per-AG reservation
code, which is that we only return ENOSPC to callers if the entire fs
doesn't have enough free blocks. Except for debugging mode, the
reservation init code doesn't ever check that there's enough free space
in that AG to cover the reservation.
Not having enough space is not considered an immediate fatal error that
requires filesystem offlining because (a) it's shouldn't be possible to
wind up in that state through normal file operations and (b) even if
one did, freeing data blocks would recover the situation.
However, online shrink now needs to know if shrinking would not leave
enough space so that it can abort the shrink operation. Hence we need
to promote this assertion into an actual error return.
Observed by running xfs/168 with a 1k block size, though in theory this
could happen with any configuration.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Darrick J. Wong [Wed, 12 May 2021 23:43:10 +0000 (16:43 -0700)]
xfs: restore old ioctl definitions
These ioctl definitions in xfs_fs.h are part of the userspace ABI and
were mistakenly removed during the 5.13 merge window.
Fixes: bc486d6a68bb ("xfs: convert to fileattr") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong [Wed, 12 May 2021 23:41:13 +0000 (16:41 -0700)]
xfs: fix deadlock retry tracepoint arguments
sc->ip is the inode that's being scrubbed, which means that it's not set
for scrub types that don't involve inodes. If one of those scrubbers
(e.g. inode btrees) returns EDEADLOCK, we'll trip over the null pointer.
Fix that by reporting either the file being examined or the file that
was used to call scrub.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
Darrick J. Wong [Sun, 9 May 2021 23:22:55 +0000 (16:22 -0700)]
xfs: retry allocations when locality-based search fails
If a realtime allocation fails because we can't find a sufficiently
large free extent satisfying locality rules, relax the locality rules
and try again. This reduces the occurrence of short writes to realtime
files when the write size is large and the free space is fragmented.
This was originally discovered by running generic/186 with the realtime
reflink patchset and a 128k cow extent size hint, but the short write
symptoms can manifest with a 128k extent size hint and no reflink, so
apply the fix now.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Darrick J. Wong [Sun, 9 May 2021 23:22:54 +0000 (16:22 -0700)]
xfs: adjust rt allocation minlen when extszhint > rtextsize
xfs_bmap_rtalloc doesn't handle realtime extent files with extent size
hints larger than the rt volume's extent size properly, because
xfs_bmap_extsize_align can adjust the offset/length parameters to try to
fit the extent size hint.
Under these conditions, minlen has to be large enough so that any
allocation returned by xfs_rtallocate_extent will be large enough to
cover at least one of the blocks that the caller asked for. If the
allocation is too short, bmapi_write will return no mapping for the
requested range, which causes ENOSPC errors in other parts of the
filesystem.
Therefore, adjust minlen upwards to fix this. This can be found by
running generic/263 (g/127 or g/522) with a realtime extent size hint
that's larger than the rt volume extent size.
Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Linus Torvalds [Sun, 16 May 2021 17:13:14 +0000 (10:13 -0700)]
Merge tag 'driver-core-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two driver fixes for driver core changes that happened in
5.13-rc1.
The clk driver fix resolves a many-reported issue with booting some
devices, and the USB typec fix resolves the reported problem of USB
systems on some embedded boards.
Both of these have been in linux-next this week with no reported
issues"
* tag 'driver-core-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
clk: Skip clk provider registration when np is NULL
usb: typec: tcpm: Don't block probing of consumers of "connector" nodes
Linus Torvalds [Sun, 16 May 2021 16:55:05 +0000 (09:55 -0700)]
Merge tag 'usb-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.13-rc2. They consist of a number
of resolutions for reported issues:
- typec fixes for found problems
- xhci fixes and quirk additions
- dwc3 driver fixes
- minor fixes found by Coverity
- cdc-wdm fixes for reported problems
All of these have been in linux-next for a few days with no reported
issues"
* tag 'usb-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
usb: core: hub: fix race condition about TRSMRCY of resume
usb: typec: tcpm: Fix SINK_DISCOVERY current limit for Rp-default
xhci: Add reset resume quirk for AMD xhci controller.
usb: xhci: Increase timeout for HC halt
xhci: Do not use GFP_KERNEL in (potentially) atomic context
xhci: Fix giving back cancelled URBs even if halted endpoint can't reset
xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
usb: musb: Fix an error message
usb: typec: tcpm: Fix wrong handling for Not_Supported in VDM AMS
usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work
usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
usb: fotg210-hcd: Fix an error message
docs: usb: function: Modify path name
usb: dwc3: omap: improve extcon initialization
usb: typec: ucsi: Put fwnode in any case during ->probe()
usb: typec: tcpm: Fix wrong handling in GET_SINK_CAP
usb: dwc2: Remove obsolete MODULE_ constants from platform.c
usb: dwc3: imx8mp: fix error return code in dwc3_imx8mp_probe()
usb: dwc3: imx8mp: detect dwc3 core node via compatible string
usb: dwc3: gadget: Return success always for kick transfer in ep queue
...
Linus Torvalds [Sun, 16 May 2021 16:42:13 +0000 (09:42 -0700)]
Merge tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Two fixes for timers:
- Use the ALARM feature check in the alarmtimer core code insted of
the old method of checking for the set_alarm() callback.
Drivers can have that callback set but the feature bit cleared. If
such a RTC device is selected then alarms wont work.
- Use a proper define to let the preprocessor check whether Hyper-V
VDSO clocksource should be active.
The code used a constant in an enum with #ifdef, which evaluates to
always false and disabled the clocksource for VDSO"
* tag 'timers-urgent-2021-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86
alarmtimer: Check RTC features instead of ops
Linus Torvalds [Sun, 16 May 2021 16:39:04 +0000 (09:39 -0700)]
Merge tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- two patches for error path fixes
- a small series for fixing a regression with swiotlb with Xen on Arm
* tag 'for-linus-5.13b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/swiotlb: check if the swiotlb has already been initialized
arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required
xen/arm: move xen_swiotlb_detect to arm/swiotlb-xen.h
xen/unpopulated-alloc: fix error return code in fill_list()
xen/gntdev: fix gntdev_mmap() error exit path
Linus Torvalds [Sun, 16 May 2021 16:31:06 +0000 (09:31 -0700)]
Merge tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
"The three SEV commits are not really urgent material. But we figured
since getting them in now will avoid a huge amount of conflicts
between future SEV changes touching tip, the kvm and probably other
trees, sending them to you now would be best.
The idea is that the tip, kvm etc branches for 5.14 will all base
ontop of -rc2 and thus everything will be peachy. What is more, those
changes are purely mechanical and defines movement so they should be
fine to go now (famous last words).
Summary:
- Enable -Wundef for the compressed kernel build stage
- Reorganize SEV code to streamline and simplify future development"
* tag 'x86_urgent_for_v5.13_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Enable -Wundef
x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG
x86/sev: Move GHCB MSR protocol and NAE definitions in a common header
x86/sev-es: Rename sev-es.{ch} to sev.{ch}
Linus Torvalds [Sat, 15 May 2021 17:24:48 +0000 (10:24 -0700)]
Merge tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix an idle CPU selection bug, and an AMD Ryzen maximum frequency
enumeration bug"
* tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations
sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu()
Linus Torvalds [Sat, 15 May 2021 16:42:27 +0000 (09:42 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"13 patches.
Subsystems affected by this patch series: resource, squashfs, hfsplus,
modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan,
pagemap, and ioremap)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/ioremap: fix iomap_max_page_shift
docs: admin-guide: update description for kernel.modprobe sysctl
hfsplus: prevent corruption in shrinking truncate
mm/filemap: fix readahead return types
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
mm: fix struct page layout on 32-bit systems
ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"
userfaultfd: release page in error path to avoid BUG_ON
squashfs: fix divide error in calculate_skip()
kernel/resource: fix return code check in __request_free_mem_region
mm, slub: move slub_debug static key enabling outside slab_mutex
mm/hugetlb: fix cow where page writtable in child
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
Linus Torvalds [Sat, 15 May 2021 16:01:45 +0000 (09:01 -0700)]
Merge tag 'arc-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE fixes
- syscall num check off-by-one bug
- misc fixes
* tag 'arc-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: mm: Use max_high_pfn as a HIGHMEM zone border
ARC: mm: PAE: use 40-bit physical page mask
ARC: entry: fix off-by-one error in syscall number validation
ARC: kgdb: add 'fallthrough' to prevent a warning
arc: Fix typos/spellos
Linus Torvalds [Sat, 15 May 2021 15:52:30 +0000 (08:52 -0700)]
Merge tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix for shared tag set exit (Bart)
- Correct ioctl range for zoned ioctls (Damien)
- Removed dead/unused function (Lin)
- Fix perf regression for shared tags (Ming)
- Fix out-of-bounds issue with kyber and preemption (Omar)
- BFQ merge fix (Paolo)
- Two error handling fixes for nbd (Sun)
- Fix weight update in blk-iocost (Tejun)
- NVMe pull request (Christoph):
- correct the check for using the inline bio in nvmet (Chaitanya
Kulkarni)
- demote unsupported command warnings (Chaitanya Kulkarni)
- fix corruption due to double initializing ANA state (me, Hou Pu)
- reset ns->file when open fails (Daniel Wagner)
- fix a NULL deref when SEND is completed with error in nvmet-rdma
(Michal Kalderon)
- Fix kernel-doc warning (Bart)
* tag 'block-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
block/partitions/efi.c: Fix the efi_partition() kernel-doc header
blk-mq: Swap two calls in blk_mq_exit_queue()
blk-mq: plug request for shared sbitmap
nvmet: use new ana_log_size instead the old one
nvmet: seset ns->file when open fails
nbd: share nbd_put and return by goto put_nbd
nbd: Fix NULL pointer in flush_workqueue
blkdev.h: remove unused codes blk_account_rq
block, bfq: avoid circular stable merges
blk-iocost: fix weight updates of inner active iocgs
nvmet: demote fabrics cmd parse err msg to debug
nvmet: use helper to remove the duplicate code
nvmet: demote discovery cmd parse err msg to debug
nvmet-rdma: Fix NULL deref when SEND is completed with error
nvmet: fix inline bio check for passthru
nvmet: fix inline bio check for bdev-ns
nvme-multipath: fix double initialization of ANA state
kyber: fix out of bounds access when preempted
block: uapi: fix comment about block device ioctl
Linus Torvalds [Sat, 15 May 2021 15:43:44 +0000 (08:43 -0700)]
Merge tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just a few minor fixes/changes:
- Fix issue with double free race for linked timeout completions
- Fix reference issue with timeouts
- Remove last few places that make SQPOLL special, since it's just an
io thread now.
- Bump maximum allowed registered buffers, as we don't allocate as
much anymore"
* tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block:
io_uring: increase max number of reg buffers
io_uring: further remove sqpoll limits on opcodes
io_uring: fix ltout double free on completion race
io_uring: fix link timeout refs
Linus Torvalds [Sat, 15 May 2021 15:37:21 +0000 (08:37 -0700)]
Merge tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fixes from Gao Xiang:
"This mainly fixes 1 lcluster-sized pclusters for the big pcluster
feature, which can be forcely generated by mkfs as a specific on-disk
case for per-(sub)file compression strategies but missed to handle in
runtime properly.
Also, documentation updates are included to fix the broken
illustration due to the ReST conversion by accident and complete the
big pcluster introduction.
Summary:
- update documentation to fix the broken illustration due to ReST
conversion by accident at that time and complete the big pcluster
introduction
- fix 1 lcluster-sized pclusters for the big pcluster feature"
* tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix 1 lcluster-sized pcluster for big pcluster
erofs: update documentation about data compression
erofs: fix broken illustration in documentation
Linus Torvalds [Sat, 15 May 2021 15:32:51 +0000 (08:32 -0700)]
Merge tag 'libnvdimm-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"A regression fix for a bootup crash condition introduced in this merge
window and some other minor fixups:
- Fix regression in ACPI NFIT table handling leading to crashes and
driver load failures.
- Move the nvdimm mailing list
- Miscellaneous minor fixups"
* tag 'libnvdimm-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ACPI: NFIT: Fix support for variable 'SPA' structure size
MAINTAINERS: Move nvdimm mailing list
tools/testing/nvdimm: Make symbol '__nfit_test_ioremap' static
libnvdimm: Remove duplicate struct declaration
Linus Torvalds [Sat, 15 May 2021 15:28:08 +0000 (08:28 -0700)]
Merge tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax fixes from Dan Williams:
"A fix for a hang condition due to missed wakeups in the filesystem-dax
core when exercised by virtiofs.
This bug has been there from the beginning, but the condition has
not triggered on other filesystems since they hold a lock over
invalidation events"
* tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: Wake up all waiters after invalidating dax entry
dax: Add a wakeup mode parameter to put_unlocked_entry()
dax: Add an enum for specifying dax wakup mode
Linus Torvalds [Sat, 15 May 2021 15:18:29 +0000 (08:18 -0700)]
Merge tag 'drm-fixes-2021-05-15' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Looks like I wasn't the only one not fully switched on this week. The
msm pull has a missing tag so I missed it, and i915 team were a bit
late. In my defence I did have a day with the roof of my home office
removed, so was sitting at my kids desk.
i915:
- Fix active callback alignment annotations and subsequent crashes
- Retract link training strategy to slow and wide, again
- Avoid division by zero on gen2
- Use correct width reads for C0DRB3/C1DRB3 registers
- Fix double free in pdp allocation failure path
- Fix HDMI 2.1 PCON downstream caps check"
* tag 'drm-fixes-2021-05-15' of git://anongit.freedesktop.org/drm/drm:
drm/i915: Use correct downstream caps for check Src-Ctl mode for PCON
drm/i915/overlay: Fix active retire callback alignment
drm/i915: Fix crash in auto_retire
drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
drm/i915: Avoid div-by-zero on gen2
drm/i915/dp: Use slow and wide link training for everything
drm/msm/dp: initialize audio_comp when audio starts
drm/msm/dp: check sink_count before update is_connected status
drm/msm: fix minor version to indicate MSM_PARAM_SUSPENDS support
drm/msm/dsi: fix msm_dsi_phy_get_clk_provider return code
drm/msm/dsi: dsi_phy_28nm_8960: fix uninitialized variable access
drm/msm: fix LLC not being enabled for mmu500 targets
drm/msm: Do not unpin/evict exported dma-buf's
syzbot is reporting OOB write at vga16fb_imageblit() [1], for
resize_screen() from ioctl(VT_RESIZE) returns 0 without checking whether
requested rows/columns fit the amount of memory reserved for the graphical
screen if current mode is KD_GRAPHICS.
Christophe Leroy [Sat, 15 May 2021 00:27:39 +0000 (17:27 -0700)]
mm/ioremap: fix iomap_max_page_shift
iomap_max_page_shift is expected to contain a page shift, so it can't be a
'bool', has to be an 'unsigned int'
And fix the default values: P4D_SHIFT is when huge iomap is allowed.
However, on some architectures (eg: powerpc book3s/64), P4D_SHIFT is not a
constant so it can't be used to initialise a static variable. So,
initialise iomap_max_page_shift with a maximum shift supported by the
architecture, it is gated by P4D_SHIFT in vmap_try_huge_p4d() anyway.
Link: https://lkml.kernel.org/r/ad2d366015794a9f21320dcbdd0a8eb98979e9df.1620898113.git.christophe.leroy@csgroup.eu Fixes: b9fa45504548 ("mm: HUGE_VMAP arch support cleanup") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rasmus Villemoes [Sat, 15 May 2021 00:27:36 +0000 (17:27 -0700)]
docs: admin-guide: update description for kernel.modprobe sysctl
When I added CONFIG_MODPROBE_PATH, I neglected to update Documentation/.
It's still true that this defaults to /sbin/modprobe, but now via a level
of indirection. So document that the kernel might have been built with
something other than /sbin/modprobe as the initial value.
Jouni Roivas [Sat, 15 May 2021 00:27:33 +0000 (17:27 -0700)]
hfsplus: prevent corruption in shrinking truncate
I believe there are some issues introduced by commit a88dc9f0615d
("hfsplus: avoid deadlock on file truncation")
HFS+ has extent records which always contains 8 extents. In case the
first extent record in catalog file gets full, new ones are allocated from
extents overflow file.
In case shrinking truncate happens to middle of an extent record which
locates in extents overflow file, the logic in hfsplus_file_truncate() was
changed so that call to hfs_brec_remove() is not guarded any more.
Right action would be just freeing the extents that exceed the new size
inside extent record by calling hfsplus_free_extents(), and then check if
the whole extent record should be removed. However since the guard
(blk_cnt > start) is now after the call to hfs_brec_remove(), this has
unfortunate effect that the last matching extent record is removed
unconditionally.
To reproduce this issue, create a file which has at least 10 extents, and
then perform shrinking truncate into middle of the last extent record, so
that the number of remaining extents is not under or divisible by 8. This
causes the last extent record (8 extents) to be removed totally instead of
truncating into middle of it. Thus this causes corruption, and lost data.
Fix for this is simply checking if the new truncated end is below the
start of this extent record, making it safe to remove the full extent
record. However call to hfs_brec_remove() can't be moved to it's previous
place since we're dropping ->tree_lock and it can cause a race condition
and the cached info being invalidated possibly corrupting the node data.
Another issue is related to this one. When entering into the block
(blk_cnt > start) we are not holding the ->tree_lock. We break out from
the loop not holding the lock, but hfs_find_exit() does unlock it. Not
sure if it's possible for someone else to take the lock under our feet,
but it can cause hard to debug errors and premature unlocking. Even if
there's no real risk of it, the locking should still always be kept in
balance. Thus taking the lock now just before the check.
A readahead request will not allocate more memory than can be represented
by a size_t, even on systems that have HIGHMEM available. Change the
length functions from returning an loff_t to a size_t.
Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org Fixes: 2fc26381224f60 ("btrfs: add and use readahead_batch_length") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
These tests deliberately access these arrays out of bounds, which will
cause the dynamic local bounds checks inserted by
CONFIG_UBSAN_LOCAL_BOUNDS to fail and panic the kernel. To avoid this
problem, access the arrays via volatile pointers, which will prevent the
compiler from being able to determine the array bounds.
These accesses use volatile pointers to char (char *volatile) rather than
the more conventional pointers to volatile char (volatile char *) because
we want to prevent the compiler from making inferences about the pointer
itself (i.e. its array bounds), not the data that it refers to.
Link: https://lkml.kernel.org/r/20210507025915.1464056-1-pcc@google.com Link: https://linux-review.googlesource.com/id/I90b1713fbfa1bf68ff895aef099ea77b98a7c3b9 Signed-off-by: Peter Collingbourne <pcc@google.com> Tested-by: Alexander Potapenko <glider@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Peter Collingbourne <pcc@google.com> Cc: George Popescu <georgepope@android.com> Cc: Elena Petrova <lenaptr@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
32-bit architectures which expect 8-byte alignment for 8-byte integers and
need 64-bit DMA addresses (arm, mips, ppc) had their struct page
inadvertently expanded in 2019. When the dma_addr_t was added, it forced
the alignment of the union to 8 bytes, which inserted a 4 byte gap between
'flags' and the union.
Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
This restores the alignment to that of an unsigned long. We always
store the low bits in the first word to prevent the PageTail bit from
being inadvertently set on a big endian platform. If that happened,
get_user_pages_fast() racing against a page which was freed and
reallocated to the page_pool could dereference a bogus compound_head(),
which would be hard to trace back to this cause.
Hugh Dickins [Sat, 15 May 2021 00:27:22 +0000 (17:27 -0700)]
ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()"
This reverts commit 3057d8b41693b4f30e5e055bcedd178844c2d260. General
Protection Fault in rmap_walk_ksm() under memory pressure:
remove_rmap_item_from_tree() needs to take page lock, of course.
Axel Rasmussen [Sat, 15 May 2021 00:27:19 +0000 (17:27 -0700)]
userfaultfd: release page in error path to avoid BUG_ON
Consider the following sequence of events:
1. Userspace issues a UFFD ioctl, which ends up calling into
shmem_mfill_atomic_pte(). We successfully account the blocks, we
shmem_alloc_page(), but then the copy_from_user() fails. We return
-ENOENT. We don't release the page we allocated.
2. Our caller detects this error code, tries the copy_from_user() after
dropping the mmap_lock, and retries, calling back into
shmem_mfill_atomic_pte().
3. Meanwhile, let's say another process filled up the tmpfs being used.
4. So shmem_mfill_atomic_pte() fails to account blocks this time, and
immediately returns - without releasing the page.
This triggers a BUG_ON in our caller, which asserts that the page
should always be consumed, unless -ENOENT is returned.
To fix this, detect if we have such a "dangling" page when accounting
fails, and if so, release it before returning.
Link: https://lkml.kernel.org/r/20210428230858.348400-1-axelrasmussen@google.com Fixes: 5e337a9e225f ("userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY") Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reported-by: Hugh Dickins <hughd@google.com> Acked-by: Hugh Dickins <hughd@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phillip Lougher [Sat, 15 May 2021 00:27:16 +0000 (17:27 -0700)]
squashfs: fix divide error in calculate_skip()
Sysbot has reported a "divide error" which has been identified as being
caused by a corrupted file_size value within the file inode. This value
has been corrupted to a much larger value than expected.
Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to
the file_size value corruption this overflows the int argument/variable in
that function, leading to the divide error.
This patch changes the function to use u64. This will accommodate any
unexpectedly large values due to corruption.
The value returned from calculate_skip() is clamped to be never more than
SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to
an unexpectedly large return result here.
Alistair Popple [Sat, 15 May 2021 00:27:13 +0000 (17:27 -0700)]
kernel/resource: fix return code check in __request_free_mem_region
Splitting an earlier version of a patch that allowed calling
__request_region() while holding the resource lock into a series of
patches required changing the return code for the newly introduced
__request_region_locked().
Unfortunately this change was not carried through to a subsequent commit 86b0e53b8317 ("kernel/resource: fix locking in request_free_mem_region")
in the series. This resulted in a use-after-free due to freeing the
struct resource without properly releasing it. Fix this by correcting the
return code check so that the struct is not freed if the request to add it
was successful.
Link: https://lkml.kernel.org/r/20210512073528.22334-1-apopple@nvidia.com Fixes: 86b0e53b8317 ("kernel/resource: fix locking in request_free_mem_region") Signed-off-by: Alistair Popple <apopple@nvidia.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Muchun Song <smuchun@gmail.com> Cc: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vlastimil Babka [Sat, 15 May 2021 00:27:10 +0000 (17:27 -0700)]
mm, slub: move slub_debug static key enabling outside slab_mutex
Paul E. McKenney reported [1] that commit 3c3e5bc3584d ("mm, slub: enable
slub_debug static key when creating cache with explicit debug flags")
results in the lockdep complaint:
======================================================
WARNING: possible circular locking dependency detected
5.12.0+ #15 Not tainted
------------------------------------------------------
rcu_torture_sta/109 is trying to acquire lock: ffffffff96063cd0 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_enable+0x9/0x20
but task is already holding lock: ffffffff96173c28 (slab_mutex){+.+.}-{3:3}, at: kmem_cache_create_usercopy+0x2d/0x250
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
This is because there's one order of locking from the hotplug callbacks:
lock(cpu_hotplug_lock); // from hotplug machinery itself
lock(slab_mutex); // in e.g. slab_mem_going_offline_callback()
And commit 3c3e5bc3584d made the reverse sequence possible:
lock(slab_mutex); // in kmem_cache_create_usercopy()
lock(cpu_hotplug_lock); // kmem_cache_open() -> static_key_enable()
The simplest fix is to move static_key_enable() to a place before slab_mutex is
taken. That means kmem_cache_create_usercopy() in mm/slab_common.c which is not
ideal for SLUB-specific code, but the #ifdef CONFIG_SLUB_DEBUG makes it
at least self-contained and obvious.
Link: https://lkml.kernel.org/r/20210504120019.26791-1-vbabka@suse.cz Fixes: 3c3e5bc3584d ("mm, slub: enable slub_debug static key when creating cache with explicit debug flags") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Xu [Sat, 15 May 2021 00:27:07 +0000 (17:27 -0700)]
mm/hugetlb: fix cow where page writtable in child
When rework early cow of pinned hugetlb pages, we moved huge_ptep_get()
upper but overlooked a side effect that the huge_ptep_get() will fetch the
pte after wr-protection. After moving it upwards, we need explicit
wr-protect of child pte or we will keep the write bit set in the child
process, which could cause data corrution where the child can write to the
original page directly.
This issue can also be exposed by "memfd_test hugetlbfs" kselftest.
Link: https://lkml.kernel.org/r/20210503234356.9097-3-peterx@redhat.com Fixes: 531a8f44d6100 ("hugetlb: do early cow when page pinned on src mm") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Xu [Sat, 15 May 2021 00:27:04 +0000 (17:27 -0700)]
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2.
Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to
hugetlbfs, which I can easily verify using the memfd_test program, which
seems that the program is hardly run with hugetlbfs pages (as by default
shmem).
Meanwhile I found another probably even more severe issue on that hugetlb
fork won't wr-protect child cow pages, so child can potentially write to
parent private pages. Patch 2 addresses that.
After this series applied, "memfd_test hugetlbfs" should start to pass.
This patch (of 2):
F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day.
There is a test program for that and it fails constantly.
Linus Torvalds [Fri, 14 May 2021 20:44:51 +0000 (13:44 -0700)]
Merge tag 'trace-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix trace_check_vprintf() for %.*s
The sanity check of all strings being read from the ring buffer to
make sure they are in safe memory space did not account for the %.*s
notation having another parameter to process (the length).
Add that to the check"
* tag 'trace-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Handle %.*s in trace_check_vprintf()
Dave Airlie [Fri, 14 May 2021 20:12:45 +0000 (06:12 +1000)]
Merge tag 'drm-intel-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fixes for v5.13-rc2:
- Fix active callback alignment annotations and subsequent crashes
- Retract link training strategy to slow and wide, again
- Avoid division by zero on gen2
- Use correct width reads for C0DRB3/C1DRB3 registers
- Fix double free in pdp allocation failure path
- Fix HDMI 2.1 PCON downstream caps check
Linus Torvalds [Fri, 14 May 2021 17:52:47 +0000 (10:52 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"Fixes and cpucaps.h automatic generation:
- Generate cpucaps.h at build time rather than carrying lots of
#defines. Merged at -rc1 to avoid some conflicts during the merge
window.
- Initialise RGSR_EL1.SEED in __cpu_setup() as it may be left as 0
out of reset and the IRG instruction would not function as expected
if only the architected pseudorandom number generator is
implemented.
- Fix potential race condition in __sync_icache_dcache() where the
PG_dcache_clean page flag is set before the actual cache
maintenance.
- Fix header include in BTI kselftests"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
arm64: tools: Add __ASM_CPUCAPS_H to the endif in cpucaps.h
arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
kselftest/arm64: Add missing stddef.h include to BTI tests
arm64: Generate cpucaps.h
Linus Torvalds [Fri, 14 May 2021 17:49:20 +0000 (10:49 -0700)]
Merge tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"This fixes some critical bugs such as memory leak in compression
flows, kernel panic when handling errors, and swapon failure due to
newly added condition check"
* tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: return EINVAL for hole cases in swap file
f2fs: avoid swapon failure by giving a warning first
f2fs: compress: fix to assign cc.cluster_idx correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to free compress page correctly
f2fs: support iflag change given the mask
f2fs: avoid null pointer access when handling IPU error
radeon:
- Fixes for flexible array conversions
- Fix for flickering on Oland with multiple 4K displays
vc4:
- drop unused function"
* tag 'drm-fixes-2021-05-14' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: update vcn1.0 Non-DPG suspend sequence
drm/amdgpu: set vcn mgcg flag for picasso
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
drm/amdgpu: update the method for harvest IP for specific SKU
drm/amdgpu: add judgement when add ip blocks (v2)
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
drm/amd/pm: Fix out-of-bounds bug
drm/radeon/si_dpm: Fix SMU power state load
drm/radeon/ni_dpm: Fix booting bug
MAINTAINERS: Update address for Emma Anholt
MAINTAINERS: Update my e-mail
drm/vc4: remove unused function
drm/ttm: Do not add non-system domain BO into swap list
Catalin Marinas [Fri, 14 May 2021 09:50:01 +0000 (10:50 +0100)]
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
To ensure that instructions are observable in a new mapping, the arm64
set_pte_at() implementation cleans the D-cache and invalidates the
I-cache to the PoU. As an optimisation, this is only done on executable
mappings and the PG_dcache_clean page flag is set to avoid future cache
maintenance on the same page.
When two different processes map the same page (e.g. private executable
file or shared mapping) there's a potential race on checking and setting
PG_dcache_clean via set_pte_at() -> __sync_icache_dcache(). While on the
fault paths the page is locked (PG_locked), mprotect() does not take the
page lock. The result is that one process may see the PG_dcache_clean
flag set but the I/D cache maintenance not yet performed.
Avoid test_and_set_bit(PG_dcache_clean) in favour of separate test_bit()
and set_bit(). In the rare event of a race, the cache maintenance is
done twice.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Steven Price <steven.price@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210514095001.13236-1-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Bart Van Assche [Thu, 13 May 2021 17:15:29 +0000 (10:15 -0700)]
blk-mq: Swap two calls in blk_mq_exit_queue()
If a tag set is shared across request queues (e.g. SCSI LUNs) then the
block layer core keeps track of the number of active request queues in
tags->active_queues. blk_mq_tag_busy() and blk_mq_tag_idle() update that
atomic counter if the hctx flag BLK_MQ_F_TAG_QUEUE_SHARED is set. Make
sure that blk_mq_exit_queue() calls blk_mq_tag_idle() before that flag is
cleared by blk_mq_del_queue_tag_set().
Cc: Christoph Hellwig <hch@infradead.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.com> Fixes: cfa670be5cd2 ("blk-mq: improve support for shared tags maps") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210513171529.7977-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Fri, 14 May 2021 02:20:52 +0000 (10:20 +0800)]
blk-mq: plug request for shared sbitmap
In case of shared sbitmap, request won't be held in plug list any more
sine commit 99a8dd3f1c35 ("blk-mq: Facilitate a shared sbitmap per
tagset"), this way makes request merge from flush plug list & batching
submission not possible, so cause performance regression.
Yanhui reports performance regression when running sequential IO
test(libaio, 16 jobs, 8 depth for each job) in VM, and the VM disk
is emulated with image stored on xfs/megaraid_sas.
Fix the issue by recovering original behavior to allow to hold request
in plug list.
Cc: Yanhui Ma <yama@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: kashyap.desai@broadcom.com Fixes: 99a8dd3f1c35 ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514022052.1047665-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
xen/swiotlb: check if the swiotlb has already been initialized
xen_swiotlb_init calls swiotlb_late_init_with_tbl, which fails with
-ENOMEM if the swiotlb has already been initialized.
Add an explicit check io_tlb_default_mem != NULL at the beginning of
xen_swiotlb_init. If the swiotlb is already initialized print a warning
and return -EEXIST.
On x86, the error propagates.
On ARM, we don't actually need a special swiotlb buffer (yet), any
buffer would do. So ignore the error and continue.
arm64: do not set SWIOTLB_NO_FORCE when swiotlb is required
Although SWIOTLB_NO_FORCE is meant to allow later calls to swiotlb_init,
today dma_direct_map_page returns error if SWIOTLB_NO_FORCE.
For now, without a larger overhaul of SWIOTLB_NO_FORCE, the best we can
do is to avoid setting SWIOTLB_NO_FORCE in mem_init when we know that it
is going to be required later (e.g. Xen requires it).
Vitaly Kuznetsov [Thu, 13 May 2021 07:32:46 +0000 (09:32 +0200)]
clocksource/drivers/hyper-v: Re-enable VDSO_CLOCKMODE_HVCLOCK on X86
Mohammed reports (https://bugzilla.kernel.org/show_bug.cgi?id=213029)
the commit bccc999ea88e ("clocksource/drivers/hyper-v: Handle vDSO
differences inline") broke vDSO on x86. The problem appears to be that
VDSO_CLOCKMODE_HVCLOCK is an enum value in 'enum vdso_clock_mode' and
'#ifdef VDSO_CLOCKMODE_HVCLOCK' branch evaluates to false (it is not
a define).
Use a dedicated HAVE_VDSO_CLOCKMODE_HVCLOCK define instead.
Fixes: bccc999ea88e ("clocksource/drivers/hyper-v: Handle vDSO differences inline") Reported-by: Mohammed Gamal <mgamal@redhat.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/20210513073246.1715070-1-vkuznets@redhat.com
Pavel Begunkov [Fri, 14 May 2021 11:06:44 +0000 (12:06 +0100)]
io_uring: increase max number of reg buffers
Since recent changes instead of storing a large array of struct
io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and
we should not to so worry about restricting max number of registererd
buffer slots, increase the limit 4 times.
Pavel Begunkov [Fri, 14 May 2021 11:05:46 +0000 (12:05 +0100)]
io_uring: further remove sqpoll limits on opcodes
There are three types of requests that left disabled for sqpoll, namely
epoll ctx, statx, and resources update. Since SQPOLL task is now closely
mimics a userspace thread, remove the restrictions.
Pavel Begunkov [Fri, 14 May 2021 11:02:50 +0000 (12:02 +0100)]
io_uring: fix ltout double free on completion race
Always remove linked timeout on io_link_timeout_fn() from the master
request link list, otherwise we may get use-after-free when first
io_link_timeout_fn() puts linked timeout in the fail path, and then
will be found and put on master's free.
Nicholas Piggin [Fri, 14 May 2021 04:40:08 +0000 (14:40 +1000)]
powerpc/64e/interrupt: Fix nvgprs being clobbered
Some interrupt handlers have an "extra" that saves 1 or 2
registers (r14, r15) in the paca save area and makes them available to
use by the handler.
The change to always save nvgprs in exception handlers lead to some
interrupt handlers saving those scratch r14 / r15 registers into the
interrupt frame's GPR saves, which get restored on interrupt exit.
Fix this by always reloading those scratch registers from paca before
the EXCEPTION_COMMON that saves nvgprs.
Fixes: 3f27476060b0 ("powerpc/64e/interrupt: always save nvgprs on interrupt") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210514044008.1955783-1-npiggin@gmail.com
Nicholas Piggin [Mon, 3 May 2021 11:17:08 +0000 (21:17 +1000)]
powerpc/64s: Make NMI record implicitly soft-masked code as irqs disabled
scv support introduced the notion of code that implicitly soft-masks
irqs due to the instruction addresses. This is required because scv
enters the kernel with MSR[EE]=1.
If a NMI (including soft-NMI) interrupt hits when we are implicitly
soft-masked then its regs->softe does not reflect this because it is
derived from the explicit soft mask state (paca->irq_soft_mask). This
makes arch_irq_disabled_regs(regs) return false.
This can trigger a warning in the soft-NMI watchdog code (shown below).
Fix it by having NMI interrupts set regs->softe to disabled in case of
interrupting an implicit soft-masked region.
The stf entry barrier fallback is unsafe to execute in a semi-patched
state, which can happen when enabling/disabling the mitigation with
strict kernel RWX enabled and using the hash MMU.
See the previous commit for more details.
Fix it by changing the order in which we patch the instructions.
Note the stf barrier fallback is only used on Power6 or earlier.
The entry flush mitigation can be enabled/disabled at runtime. When this
happens it results in the kernel patching its own instructions to
enable/disable the mitigation sequence.
With strict kernel RWX enabled instruction patching happens via a
secondary mapping of the kernel text, so that we don't have to make the
primary mapping writable. With the hash MMU this leads to a hash fault,
which causes us to execute the exception entry which contains the entry
flush mitigation.
This means we end up executing the entry flush in a semi-patched state,
ie. after we have patched the first instruction but before we patch the
second or third instruction of the sequence.
On machines with updated firmware the entry flush is a series of special
nops, and it's safe to to execute in a semi-patched state.
However when using the fallback flush the sequence is mflr/branch/mtlr,
and so it's not safe to execute if we have patched out the mflr but not
the other two instructions. Doing so leads to us corrputing LR, leading
to an oops, for example:
The simplest fix is to change the order in which we patch the
instructions, so that the sequence is always safe to execute. For the
non-fallback flushes it doesn't matter what order we patch in.
powerpc/64s: Fix crashes when toggling entry flush barrier
The entry flush mitigation can be enabled/disabled at runtime via a
debugfs file (entry_flush), which causes the kernel to patch itself to
enable/disable the relevant mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
Shows that we returned to userspace with a corrupted LR that points into
the kernel, due to executing the partially patched call to the fallback
entry flush (ie. we missed the LR restore).
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
powerpc/64s: Fix crashes when toggling stf barrier
The STF (store-to-load forwarding) barrier mitigation can be
enabled/disabled at runtime via a debugfs file (stf_barrier), which
causes the kernel to patch itself to enable/disable the relevant
mitigations.
However depending on which mitigation we're using, it may not be safe to
do that patching while other CPUs are active. For example the following
crash:
Shows that we returned to userspace without restoring the user r13
value, due to executing the partially patched STF exit code.
Fix it by doing the patching under stop machine. The CPUs that aren't
doing the patching will be spinning in the core of the stop machine
logic. That is currently sufficient for our purposes, because none of
the patching we do is to that code or anywhere in the vicinity.
Linus Torvalds [Thu, 13 May 2021 19:28:10 +0000 (12:28 -0700)]
Merge tag 'pm-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These close a coverage gap in the intel_pstate driver and fix runtime
PM child count imbalance related to interactions with system-wide
suspend.
Specifics:
- Make intel_pstate work as expected on systems where the platform
firmware enables HWP even though the HWP EPP support is not
advertised (Rafael Wysocki).
- Fix possible runtime PM child count imbalance that may occur if
other runtime PM functions are called after invoking
pm_runtime_force_suspend() and before pm_runtime_force_resume()
is called (Tony Lindgren)"
* tag 'pm-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: runtime: Fix unpaired parent child_count for force_resume
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
Linus Torvalds [Thu, 13 May 2021 19:22:01 +0000 (12:22 -0700)]
Merge tag 'acpi-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert an unnecessary revert of an ACPI power management commit,
add a missing device ID to one of the lists and fix a possible memory
leak in an error path.
Specifics:
- Revert a revert of a recent ACPI power management change that does
not need to be reverted after all (Rafael Wysocki).
- Add missing fan device ID to the list of device IDs for which the
devices should not be put into the ACPI PM domain (Sumeet
Pawnikar).
- Fix possible memory leak in an error path in the ACPI device
enumeration code (Christophe JAILLET)"
* tag 'acpi-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: Add ACPI ID of Alder Lake Fan
ACPI: scan: Fix a memory leak in an error handling path
Revert "Revert "ACPI: scan: Turn off unused power resources during initialization""
If a trace event uses the %*.s notation, the trace_check_vprintf() will
fail and will warn about a bad processing of strings, because it does not
take into account the length field when processing the star (*) part.
Have it handle this case as well.
Linus Torvalds [Thu, 13 May 2021 18:12:51 +0000 (11:12 -0700)]
Merge branch 'resizex' (patches from Maciej)
Merge VT_RESIZEX fixes from Maciej Rozycki:
"I got to the bottom of the issue with VT_RESIZEX recently discussed
and came up with this small patch series, fixing an additional issue
that I originally thought might be broken VGA hardware emulation with
my laptop, which however turned out to be intertwined with the
original problem and also a regression introduced somewhat later.
The fix for that because the first patch, and then to make backporting
feasible I had to put a revert of the offending change from last
September next, followed by a proper fix for the framebuffer issue
that change had tried to address.
See individual change descriptions for details.
These have been verified with true VGA hardware (a Trident TVGA8900
ISA video adapter) using various combinations of `svgatextmode' and
`setfont' command invocations to change both the VT size and the font
size, and also switching between the text console and X11, both by
starting/stopping the X server and by switching between VTs.
All this to ensure bringing the behaviour of VGA text console back to
correct operation as it used to be with Linux 2.6.18"
* emailed patches from Maciej W. Rozycki <macro@orcam.me.uk>:
vt: Fix character height handling with VT_RESIZEX
vt_ioctl: Revert VT_RESIZEX parameter handling removal
vgacon: Record video mode changes with VT_RESIZEX
Restore the original intent of the VT_RESIZEX ioctl's `v_clin' parameter
which is the number of pixel rows per character (cell) rather than the
height of the font used.
For framebuffer devices the two values are always the same, because the
former is inferred from the latter one. For VGA used as a true text
mode device these two parameters are independent from each other: the
number of pixel rows per character is set in the CRT controller, while
font height is in fact hardwired to 32 pixel rows and fonts of heights
below that value are handled by padding their data with blanks when
loaded to hardware for use by the character generator. One can change
the setting in the CRT controller and it will update the screen contents
accordingly regardless of the font loaded.
The `v_clin' parameter is used by the `vgacon' driver to set the height
of the character cell and then the cursor position within. Make the
parameter explicit then, by defining a new `vc_cell_height' struct
member of `vc_data', set it instead of `vc_font.height' from `v_clin' in
the VT_RESIZEX ioctl, and then use it throughout the `vgacon' driver
except where actual font data is accessed which as noted above is
independent from the CRTC setting.
This way the framebuffer console driver is free to ignore the `v_clin'
parameter as irrelevant, as it always should have, avoiding any issues
attempts to give the parameter a meaning there could have caused, such
as one that has led to commit 1b718a8c3a8a ("vt_ioctl: make VT_RESIZEX
behave like VT_RESIZE"):
"syzbot is reporting UAF/OOB read at bit_putcs()/soft_cursor() [1][2],
for vt_resizex() from ioctl(VT_RESIZEX) allows setting font height
larger than actual font height calculated by con_font_set() from
ioctl(PIO_FONT). Since fbcon_set_font() from con_font_set() allocates
minimal amount of memory based on actual font height calculated by
con_font_set(), use of vt_resizex() can cause UAF/OOB read for font
data."
The problem first appeared around Linux 2.5.66 which predates our repo
history, but the origin could be identified with the old MIPS/Linux repo
also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git>
as commit 9736a3546de7 ("Merge with Linux 2.5.66."), where VT_RESIZEX
code in `vt_ioctl' was updated as follows:
if (clin)
- video_font_height = clin;
+ vc->vc_font.height = clin;
making the parameter apply to framebuffer devices as well, perhaps due
to the use of "font" in the name of the original `video_font_height'
variable. Use "cell" in the new struct member then to avoid ambiguity.
Revert the removal of code handling extra VT_RESIZEX ioctl's parameters
beyond those that VT_RESIZE supports, fixing a functional regression
causing `svgatextmode' not to resize the VT anymore.
As a consequence of the reverted change when the video adapter is
reprogrammed from the original say 80x25 text mode using a 9x16
character cell (720x400 pixel resolution) to say 80x37 text mode and the
same character cell (720x592 pixel resolution), the VT geometry does not
get updated and only upper two thirds of the screen are used for the VT,
and the lower part remains blank. The proportions change according to
text mode geometries chosen.
Revert the change verbatim then, bringing back previous VT resizing.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 1b718a8c3a8a ("vt_ioctl: make VT_RESIZEX behave like VT_RESIZE") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix an issue with VGA console font size changes made after the initial
video text mode has been changed with a user tool like `svgatextmode'
calling the VT_RESIZEX ioctl. As it stands in that case the original
screen geometry continues being used to validate further VT resizing.
Consequently when the video adapter is firstly reprogrammed from the
original say 80x25 text mode using a 9x16 character cell (720x400 pixel
resolution) to say 80x37 text mode and the same character cell (720x592
pixel resolution), and secondly the CRTC character cell updated to 9x8
(by loading a suitable font with the KD_FONT_OP_SET request of the
KDFONTOP ioctl), the VT geometry does not get further updated from 80x37
and only upper half of the screen is used for the VT, with the lower
half showing rubbish corresponding to whatever happens to be there in
the video memory that maps to that part of the screen. Of course the
proportions change according to text mode geometries and font sizes
chosen.
Address the problem then, by updating the text mode geometry defaults
rather than checking against them whenever the VT is resized via a user
ioctl.
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: 0e3b73acbaf3 ("vt/vgacon: Check if screen resize request comes from userspace") Cc: stable@vger.kernel.org # v2.6.24+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jens Axboe [Thu, 13 May 2021 17:07:17 +0000 (11:07 -0600)]
Merge tag 'nvme-5.13-2021-05-13' of git://git.infradead.org/nvme into block-5.13
Pull NVMe fixes from Christoph:
"nvme fix for Linux 5.13
- correct the check for using the inline bio in nvmet
(Chaitanya Kulkarni)
- demote unsupported command warnings (Chaitanya Kulkarni)
- fix corruption due to double initializing ANA state (me, Hou Pu)
- reset ns->file when open fails (Daniel Wagner)
- fix a NULL deref when SEND is completed with error in nvmet-rdma
(Michal Kalderon)"
* tag 'nvme-5.13-2021-05-13' of git://git.infradead.org/nvme:
nvmet: use new ana_log_size instead the old one
nvmet: seset ns->file when open fails
nvmet: demote fabrics cmd parse err msg to debug
nvmet: use helper to remove the duplicate code
nvmet: demote discovery cmd parse err msg to debug
nvmet-rdma: Fix NULL deref when SEND is completed with error
nvmet: fix inline bio check for passthru
nvmet: fix inline bio check for bdev-ns
nvme-multipath: fix double initialization of ANA state
Linus Torvalds [Thu, 13 May 2021 16:58:53 +0000 (09:58 -0700)]
Merge tag 'hwmon-for-v5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix bugs/regressions in adm9240, ltc2992, pmbus/fsp-3y, and occ
drivers, plus a minor cleanup in the corsair-psu driver"
* tag 'hwmon-for-v5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (adm9240) Fix writes into inX_max attributes
hwmon: (ltc2992) Put fwnode in error case during ->probe()
hwmon: (pmbus/fsp-3y) Fix FSP-3Y YH-5151E non-compliant vout encoding
hwmon: (occ) Fix poll rate limiting
hwmon: (corsair-psu) Remove unneeded semicolons
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Mon, 3 May 2021 07:04:10 +0000 (12:34 +0530)]
drm/amdgpu: set vcn mgcg flag for picasso
enable vcn mgcg flag for picasso.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
Screen flickers rapidly when two 4K 60Hz monitors are in use. This issue
doesn't happen when one monitor is 4K 60Hz (pixelclock 594MHz) and
another one is 4K 30Hz (pixelclock 297MHz).
The issue is gone after setting "power_dpm_force_performance_level" to
"high". Following the indication, we found that the issue occurs when
sclk is too low.
So resolve the issue by disabling sclk switching when there are two
monitors requires high pixelclock (> 297MHz).
v2:
- Only apply the fix to Oland. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Likun Gao [Fri, 7 May 2021 05:56:46 +0000 (13:56 +0800)]
drm/amdgpu: update the method for harvest IP for specific SKU
Update the method of disabling VCN IP for specific SKU for navi1x ASIC,
it will judge whether should add the related IP at the function of
amdgpu_device_ip_block_add().
David Ward [Mon, 10 May 2021 09:30:39 +0000 (05:30 -0400)]
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
It is stored in dynamically allocated memory, so sysfs_bin_attr_init() must
be called to initialize it. (Note: "initialization" only sets the .attr.key
member in this struct; it does not change the value of any other members.)
Otherwise, when CONFIG_DEBUG_LOCK_ALLOC=y this message appears during boot:
Create new structure SISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
SISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.
Currently, the code fails because flexible array _levels_ in
struct SISLANDS_SMC_SWSTATE doesn't allow for code that accesses
the first element of initialState.levels and ACPIState.levels
arrays:
because such element cannot be accessed without previously allocating
enough dynamic memory for it to exist (which never actually happens).
So, there is an out-of-bounds bug in this case.
That's why struct SISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.
Also, with the change from one-element array to flexible-array member
in commit 8dc7032f4f71 ("drm/amd/pm: Replace one-element array with
flexible-array in struct SISLANDS_SMC_SWSTATE"), the size of
dpmLevels in struct SISLANDS_SMC_STATETABLE should be fixed to be
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE - 1.
Fixes: 8dc7032f4f71 ("drm/amd/pm: Replace one-element array with flexible-array in struct SISLANDS_SMC_SWSTATE") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Create new structure SISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
SISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.
Currently, the code fails because flexible array _levels_ in
struct SISLANDS_SMC_SWSTATE doesn't allow for code that access
the first element of initialState.levels and ACPIState.levels
arrays:
because such element cannot exist without previously allocating
any dynamic memory for it (which never actually happens).
That's why struct SISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.
Also, with the change from one-element array to flexible-array member
in commit 891ce6576b63 ("drm/radeon/si_dpm: Replace one-element array
with flexible-array in struct SISLANDS_SMC_SWSTATE"), the size of
dpmLevels in struct SISLANDS_SMC_STATETABLE should be fixed to be
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
SISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE - 1.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1583 Fixes: 891ce6576b63 ("drm/radeon/si_dpm: Replace one-element array with flexible-array in struct SISLANDS_SMC_SWSTATE") Cc: stable@vger.kernel.org Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Create new structure NISLANDS_SMC_SWSTATE_SINGLE, as initialState.levels
and ACPIState.levels are never actually used as flexible arrays. Those
arrays can be used as simple objects of type
NISLANDS_SMC_HW_PERFORMANCE_LEVEL, instead.
Currently, the code fails because flexible array _levels_ in
struct NISLANDS_SMC_SWSTATE doesn't allow for code that access
the first element of initialState.levels and ACPIState.levels
arrays:
because such element cannot exist without previously allocating
any dynamic memory for it (which never actually happens).
That's why struct NISLANDS_SMC_SWSTATE should only be used as type
for object driverState and new struct SISLANDS_SMC_SWSTATE_SINGLE is
created as type for objects initialState, ACPIState and ULVState.
Also, with the change from one-element array to flexible-array member
in commit bc4ab2ce06e3 ("drm/radeon/nislands_smc.h: Replace one-element
array with flexible-array member in struct NISLANDS_SMC_SWSTATE"), the
size of dpmLevels in struct NISLANDS_SMC_STATETABLE should be fixed to
be NISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE instead of
NISLANDS_MAX_SMC_PERFORMANCE_LEVELS_PER_SWSTATE - 1.
Bug: https://lore.kernel.org/dri-devel/3eedbe78-1fbd-4763-a7f3-ac5665e76a4a@xenosoft.de/ Fixes: bc4ab2ce06e3 ("drm/radeon/nislands_smc.h: Replace one-element array with flexible-array member in struct NISLANDS_SMC_SWSTATE") Cc: stable@vger.kernel.org Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Link: https://lore.kernel.org/dri-devel/9bb5fcbd-daf5-1669-b3e7-b8624b3c36f9@xenosoft.de/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 684bf36ce222 ("nvme-multipath: fix double initialization of ANA state") Signed-off-by: Hou Pu <houpu.main@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Chunfeng Yun [Wed, 12 May 2021 02:07:38 +0000 (10:07 +0800)]
usb: core: hub: fix race condition about TRSMRCY of resume
This may happen if the port becomes resume status exactly
when usb_port_resume() gets port status, it still need provide
a TRSMCRY time before access the device.
usb: typec: tcpm: Fix SINK_DISCOVERY current limit for Rp-default
This is a regression introduced by ebc6b55b73c6 ("usb: typec: tcpm:
Allow slow charging loops to comply to pSnkStby")
When Source advertises Rp-default, tcpm would request 500mA when in
SINK_DISCOVERY, Type-C spec advises the sink to follow BC1.2 current
limits when Rp-default is advertised.
[12750.503381] Requesting mux state 1, usb-role 2, orientation 1
[12750.503837] state change SNK_ATTACHED -> SNK_STARTUP [rev3 NONE_AMS]
[12751.003891] state change SNK_STARTUP -> SNK_DISCOVERY
[12751.003900] Setting voltage/current limit 5000 mV 500 mA
This patch restores the behavior where the tcpm would request 0mA when
Rp-default is advertised by the source.
[ 73.174252] Requesting mux state 1, usb-role 2, orientation 1
[ 73.174749] state change SNK_ATTACHED -> SNK_STARTUP [rev3 NONE_AMS]
[ 73.674800] state change SNK_STARTUP -> SNK_DISCOVERY
[ 73.674808] Setting voltage/current limit 5000 mV 0 mA
During SNK_DISCOVERY, Cap the current limit to PD_P_SNK_STDBY_MW / 5 only
for slow_charger_loop case.
Sandeep Singh [Wed, 12 May 2021 08:08:16 +0000 (11:08 +0300)]
xhci: Add reset resume quirk for AMD xhci controller.
One of AMD xhci controller require reset on resume.
Occasionally AMD xhci controller does not respond to
Stop endpoint command.
Once the issue happens controller goes into bad state
and in that case controller needs to be reset.
Maximilian Luz [Wed, 12 May 2021 08:08:15 +0000 (11:08 +0300)]
usb: xhci: Increase timeout for HC halt
On some devices (specifically the SC8180x based Surface Pro X with
QCOM04A6) HC halt / xhci_halt() times out during boot. Manually binding
the xhci-hcd driver at some point later does not exhibit this behavior.
To work around this, double XHCI_MAX_HALT_USEC, which also resolves this
issue.
xhci: Do not use GFP_KERNEL in (potentially) atomic context
'xhci_urb_enqueue()' is passed a 'mem_flags' argument, because "URBs may be
submitted in interrupt context" (see comment related to 'usb_submit_urb()'
in 'drivers/usb/core/urb.c')
So this flag should be used in all the calling chain.
Up to now, 'xhci_check_maxpacket()' which is only called from
'xhci_urb_enqueue()', uses GFP_KERNEL.
Be safe and pass the mem_flags to this function as well.
Fixes: 0ab9590c204c ("xhci: Use command structures when queuing commands on the command ring") Cc: <stable@vger.kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20210512080816.866037-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mathias Nyman [Wed, 12 May 2021 08:08:13 +0000 (11:08 +0300)]
xhci: Fix giving back cancelled URBs even if halted endpoint can't reset
Commit 2167831515c3 ("xhci: Fix halted endpoint at stop endpoint command
completion") in 5.12 changes how cancelled URBs are given back.
To cancel a URB xhci driver needs to stop the endpoint first.
To clear a halted endpoint xhci driver needs to reset the endpoint.
In rare cases when an endpoint halt (error) races with a endpoint stop we
need to clear the reset before removing, and giving back the cancelled URB.
The above change in 5.12 takes care of this, but it also relies on the
reset endpoint completion handler to give back the cancelled URBs.
There are cases when driver refuses to queue reset endpoint commands,
for example when a link suddenly goes to an inactive error state.
In this case the cancelled URB is never given back.
Fix this by giving back the URB in the stop endpoint if queuing a reset
endpoint command fails.
Abhijeet Rao [Wed, 12 May 2021 08:08:12 +0000 (11:08 +0300)]
xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
In the same way as Intel Tiger Lake TCSS (Type-C Subsystem) the Alder Lake
TCSS xHCI needs to be runtime suspended whenever possible to allow the
TCSS hardware block to enter D3cold and thus save energy.
x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations
Some AMD Ryzen generations has different calculation method on maximum
performance. 255 is not for all ASICs, some specific generations should use 166
as the maximum performance. Otherwise, it will report incorrect frequency value
like below:
~ → lscpu | grep MHz
CPU MHz: 3400.000
CPU max MHz: 7228.3198
CPU min MHz: 2200.0000
[ mingo: Tidied up whitespace use. ]
[ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ]
Fixes: 3f94ec5f682f ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: e09de229d190 ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Reported-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com> Fixed-by: Alexander Monakov <amonakov@ispras.ru> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791 Signed-off-by: Ingo Molnar <mingo@kernel.org>
Gao Xiang [Mon, 10 May 2021 06:47:15 +0000 (14:47 +0800)]
erofs: fix 1 lcluster-sized pcluster for big pcluster
If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type
rather than a HEAD or PLAIN type instead, which means its pclustersize
_must_ be 1 lcluster (since its uncompressed size < 2 lclusters),
as illustrated below:
HEAD HEAD / PLAIN lcluster type
____________ ____________
|_:__________|_________:__| file data (uncompressed)
. .
.____________.
|____________| pcluster data (compressed)
Such on-disk case was explained before [1] but missed to be handled
properly in the runtime implementation.
It can be observed if manually generating 1 lcluster-sized pcluster
with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now.
Guenter Roeck [Wed, 12 May 2021 22:48:09 +0000 (15:48 -0700)]
hwmon: (adm9240) Fix writes into inX_max attributes
When converting the driver to use the devm_hwmon_device_register_with_info
API, the wrong register was selected when writing into inX_max attributes.
Fix it.
Fixes: 1d251f4e58e0 ("hwmon: (adm9240) Convert to devm_hwmon_device_register_with_info API") Reported-by: Chris Packham <Chris.Packham@alliedtelesis.co.nz> Tested-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Guenter Roeck <linux@roeck-us.net>