memstick/mspro_block: fix handling of read-only devices
Use set_disk_ro to propagate the read-only state to the block layer
instead of checking for it in ->open and leaking a reference in case
of a read-only device.
Add a method to notify the driver that the gendisk is about to be freed.
This allows drivers to tie the lifetime of their private data to that of
the gendisk and thus deal with device removal races without expensive
synchronization and boilerplate code.
A new flag is added so that ->free_disk is only called after a successful
call to add_disk, which significantly simplifies the error handling path
during probing.
Ming Lei [Wed, 16 Feb 2022 04:45:14 +0000 (12:45 +0800)]
block: revert 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios")
Revert commit 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large
IO scenarios") since we have another easier way to address this issue and
get better iops throttling result.
Ming Lei [Wed, 16 Feb 2022 04:45:13 +0000 (12:45 +0800)]
block: don't try to throttle split bio if iops limit isn't set
We need to throttle split bio in case of IOPS limit even though the
split bio has been marked as BIO_THROTTLED since block layer
accounts split bio actually.
If only throughput throttle is setup, no need to throttle any more
if BIO_THROTTLED is set since we have accounted & considered the
whole bio bytes already.
Add one flag of THROTL_TG_HAS_IOPS_LIMIT for serving this purpose.
Ming Lei [Wed, 16 Feb 2022 04:45:12 +0000 (12:45 +0800)]
block: throttle split bio in case of iops limit
Commit 111be8839817 ("block-throttle: avoid double charge") marks bio as
BIO_THROTTLED unconditionally if __blk_throtl_bio() is called on this bio,
then this bio won't be called into __blk_throtl_bio() any more. This way
is to avoid double charge in case of bio splitting. It is reasonable for
read/write throughput limit, but not reasonable for IOPS limit because
block layer provides io accounting against split bio.
Chunguang Xu has already observed this issue and fixed it in commit 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios").
However, that patch only covers bio splitting in __blk_queue_split(), and
we have other kind of bio splitting, such as bio_split() &
submit_bio_noacct() and other ways.
This patch tries to fix the issue in one generic way by always charging
the bio for iops limit in blk_throtl_bio(). This way is reasonable:
re-submission & fast-cloned bio is charged if it is submitted to same
disk/queue, and BIO_THROTTLED will be cleared if bio->bi_bdev is changed.
This new approach can get much more smooth/stable iops limit compared with
commit 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO
scenarios") since that commit can't throttle current split bios actually.
Also this way won't cause new double bio iops charge in
blk_throtl_dispatch_work_fn() in which blk_throtl_bio() won't be called
any more.
Reported-by: Ning Li <lining2020x@163.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Chunguang Xu <brookxu@tencent.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220216044514.2903784-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Ming Lei [Wed, 16 Feb 2022 04:45:09 +0000 (12:45 +0800)]
block: don't declare submit_bio_checks in local header
submit_bio_checks() won't be called outside of block/blk-core.c any more
since commit 9d497e2941c3 ("block: don't protect submit_bio_checks by
q_usage_counter"), so mark it as one local helper.
Fold dm_dispatch_clone_request into it's only caller, and use a switch
statement to single dispatch for the handling of the different return
values from blk_insert_cloned_request.
dm: remove useless code from dm_dispatch_clone_request
Both ->start_time_ns and the RQF_IO_STAT are set when the request is
allocated using blk_mq_alloc_request by dm-mpath in blk_mq_rq_ctx_init.
The block layer also ensures ->start_time_ns is only set when actually
needed.
The code to stack blk-mq drivers is only used by dm-multipath, and
will preferably stay that way. Make it optional and only selected
by device mapper, so that the buildbots more easily catch abuses
like the one that slipped in in the ufs driver in the last merged
window. Another positive side effects is that kernel builds without
device mapper shrink a little bit as well.
Based on the comment present in the bdev_get_queue()
bdev->bd_queue can never be NULL. Remove the NULL check for the local
variable q that is set from bdev_get_queue() for discard, write_same,
and write_zeroes.
This document is completely out of date and extremely misleading. In
general the existing kerneldoc comment serve as a much better
documentation of the still existing functionality, while the history
blurbs are pretty much irrelevant today.
Barry Song [Mon, 7 Feb 2022 07:49:31 +0000 (15:49 +0800)]
docs: block: biodoc.rst: Drop the obsolete and incorrect content
Since commit 7eaceaccab5f ("block: remove per-queue plugging"), kernel
has removed blk_run_address_space(), blk_unplug() and sync_buffer(),
and moved to on-stack plugging. The document has been obsolete for
years.
Given that there is no obvious counterparts in the new mechinism to
replace old APIs, this patch drops the content directly.
Yang Shi [Thu, 10 Feb 2022 22:52:22 +0000 (14:52 -0800)]
block: introduce block_rq_error tracepoint
Currently, rasdaemon uses the existing tracepoint block_rq_complete
and filters out non-error cases in order to capture block disk errors.
But there are a few problems with this approach:
1. Even kernel trace filter could do the filtering work, there is
still some overhead after we enable this tracepoint.
2. The filter is merely based on errno, which does not align with kernel
logic to check the errors for print_req_error().
3. block_rq_complete only provides dev major and minor to identify
the block device, it is not convenient to use in user-space.
So introduce a new tracepoint block_rq_error just for the error case.
With this patch, rasdaemon could switch to block_rq_error.
Since the new tracepoint has the similar implementation with
block_rq_complete, so move the existing code from TRACE_EVENT
block_rq_complete() into new event class block_rq_completion(). Then add
event for block_rq_complete and block_rq_err respectively from the newly
created event class per the suggestion from Chaitanya Kulkarni.
Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220210225222.260069-1-shy828301@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
John Garry [Tue, 8 Feb 2022 12:07:04 +0000 (20:07 +0800)]
sbitmap: Delete old sbitmap_queue_get_shallow()
Since __sbitmap_queue_get_shallow() was introduced in commit c05e66733788
("sbitmap: add sbitmap_get_shallow() operation"), it has not been used.
Delete __sbitmap_queue_get_shallow() and rename public
__sbitmap_queue_get_shallow() -> sbitmap_queue_get_shallow() as it is odd
to have public __foo but no foo at all.
Ming Lei [Mon, 10 Jan 2022 07:29:45 +0000 (15:29 +0800)]
lib/sbitmap: kill 'depth' from sbitmap_word
Only the last sbitmap_word can have different depth, and all the others
must have same depth of 1U << sb->shift, so not necessary to store it in
sbitmap_word, and it can be retrieved easily and efficiently by adding
one internal helper of __map_depth(sb, index).
Remove 'depth' field from sbitmap_word, then the annotation of
____cacheline_aligned_in_smp for 'word' isn't needed any more.
Not see performance effect when running high parallel IOPS test on
null_blk.
This way saves us one cacheline(usually 64 words) per each sbitmap_word.
Cc: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace open coded bio_clone_fast implementations with the actual helper.
Note that the bio allocated as part of the dm_io structure in alloc_io
will only actually be used later in alloc_tio, making this earlier
cloning of the information safe.
block: clone crypto and integrity data in __bio_clone_fast
__bio_clone_fast should also clone integrity and crypto data, as a clone
without those is incomplete. Right now the only caller that can actually
support crypto and integrity data (dm) does it manually for the one
callchain that supports these, but we better do it properly in the core.
Note that all callers except for the above mentioned one also don't need
to handle failure at all, given that the integrity and crypto clones are
based on mempool allocations that won't fail for sleeping allocations.
Song Liu [Thu, 3 Feb 2022 19:28:25 +0000 (11:28 -0800)]
block: introduce BLK_STS_OFFLINE
Currently, drivers reports BLK_STS_IOERR for devices that are not full
online or being removed. This behavior could cause confusion for users,
as they are not really I/O errors from the device.
Solve this issue with a new state BLK_STS_OFFLINE, which reports "device
offline error" in dmesg instead of "I/O error".
EIO is intentionally kept to not change user visible return value.
Dan Carpenter [Fri, 28 Jan 2022 14:09:22 +0000 (17:09 +0300)]
fs/ntfs3: remove unnecessary NULL check
This code triggers a Smatch warning:
fs/ntfs3/fsntfs.c:1606 ntfs_bio_fill_1()
warn: variable dereferenced before check 'bio' (see line 1591)
The "bio" pointer cannot be NULL so there is no need to check.
Originally there was more extensive NULL checking but it was removed
because bio_alloc() will never fail if it is allowed to sleep.
Remove this check as well.
Fixes: 39146b6f66ba ("ntfs3: remove ntfs_alloc_bio") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220128140922.GA29766@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
blk_needs_flush_plug fails to account for the cb_list, which needs
flushing as well. Remove it and just check if there is a plug instead
of poking into the internals of the plug structure.
Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment. NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.
Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
block: pass a block_device and opf to bio_alloc_bioset
Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.
Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
The memory mapped in process_rdma is contiguous, so there is no need
to loop over bio_add_page. Remove rnbd_bio_map_kern and just open code
the bio allocation and mapping in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jack Wang <jinpu.wang@ionons.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20220124091107.642561-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just open code it next to the bio allocations, which saves a few lines
of code, prepares for future changes and allows to remove the duplicate
bi_opf assignment for the bio_clone_fast case in kcryptd_io_read.
bio_alloc will never fail if it is allowed to sleep, so there is no
need for this loop. Also remove the __GFP_HIGH specifier as it doesn't
make sense here given that we'll always fall back to the mempool anyway.
open code mpage_alloc in it's two callers and simplify the results
because of the context:
- __mpage_writepage always passes GFP_NOFS and can thus always sleep and
will never get a NULL return from bio_alloc at all.
- do_mpage_readpage can only get a non-sleeping context for readahead
which never sets PF_MEMALLOC and thus doesn't need the retry loop
either.
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it. So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Sun, 30 Jan 2022 13:12:02 +0000 (15:12 +0200)]
Merge tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Drop an unused private data field in the AIC driver
- Various fixes to the realtek-rtl driver
- Make the GICv3 ITS driver compile again in !SMP configurations
- Force reset of the GICv3 ITSs at probe time to avoid issues during kexec
- Yet another kfree/bitmap_free conversion
- Various DT updates (Renesas, SiFive)
* tag 'irq_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
dt-bindings: interrupt-controller: sifive,plic: Group interrupt tuples
dt-bindings: interrupt-controller: sifive,plic: Fix number of interrupts
dt-bindings: irqchip: renesas-irqc: Add R-Car V3U support
irqchip/gic-v3-its: Reset each ITS's BASERn register before probe
irqchip/gic-v3-its: Fix build for !SMP
irqchip/loongson-pch-ms: Use bitmap_free() to free bitmap
irqchip/realtek-rtl: Service all pending interrupts
irqchip/realtek-rtl: Fix off-by-one in routing
irqchip/realtek-rtl: Map control data to virq
irqchip/apple-aic: Drop unused ipi_hwirq field
Linus Torvalds [Sun, 30 Jan 2022 13:02:32 +0000 (15:02 +0200)]
Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Prevent accesses to the per-CPU cgroup context list from another CPU
except the one it belongs to, to avoid list corruption
- Make sure parent events are always woken up to avoid indefinite hangs
in the traced workload
* tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix cgroup event list management
perf: Always wake the parent event
Linus Torvalds [Sun, 30 Jan 2022 11:09:00 +0000 (13:09 +0200)]
Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
"Make sure the membarrier-rseq fence commands are part of the reported
set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"
* tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
Linus Torvalds [Sun, 30 Jan 2022 10:55:06 +0000 (12:55 +0200)]
Merge tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add another Intel CPU model to the list of CPUs supporting the
processor inventory unique number
- Allow writing to MCE thresholding sysfs files again - a previous
change had accidentally disabled it and no one noticed. Goes to show
how much is this stuff used
* tag 'x86_urgent_for_v5.17_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN
x86/MCE/AMD: Allow thresholding interface updates after init
Linus Torvalds [Sun, 30 Jan 2022 09:21:50 +0000 (11:21 +0200)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"12 patches.
Subsystems affected by this patch series: sysctl, binfmt, ia64, mm
(memory-failure, folios, kasan, and psi), selftests, and ocfs2"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
ocfs2: fix a deadlock when commit trans
jbd2: export jbd2_journal_[grab|put]_journal_head
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
psi: fix "no previous prototype" warnings when CONFIG_CGROUPS=n
mm, kasan: use compare-exchange operation to set KASAN page tag
kasan: test: fix compatibility with FORTIFY_SOURCE
tools/testing/scatterlist: add missing defines
mm: page->mapping folio->mapping should have the same offset
memory-failure: fetch compound_head after pgmap_pfn_valid()
ia64: make IA64_MCA_RECOVERY bool instead of tristate
binfmt_misc: fix crash when load/unload module
include/linux/sysctl.h: fix register_sysctl_mount_point() return type
Joseph Qi [Sat, 29 Jan 2022 21:41:23 +0000 (13:41 -0800)]
jbd2: export jbd2_journal_[grab|put]_journal_head
Patch series "ocfs2: fix a deadlock case".
This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols
jbd2_journal_[grab|put]_journal_head as preparation and later use them
in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the
deadlock.
This patch (of 2):
This exports symbols jbd2_journal_[grab|put]_journal_head, which will be
used outside modules, e.g. ocfs2.
Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
When CONFIG_PROC_FS is disabled psi code generates the following
warnings:
kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=]
1364 | static const struct proc_ops psi_cpu_proc_ops = {
| ^~~~~~~~~~~~~~~~
kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=]
1355 | static const struct proc_ops psi_memory_proc_ops = {
| ^~~~~~~~~~~~~~~~~~~
kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=]
1346 | static const struct proc_ops psi_io_proc_ops = {
| ^~~~~~~~~~~~~~~
Make definitions of these structures and related functions conditional
on CONFIG_PROC_FS config.
Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com Fixes: 0e94682b73bf ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm, kasan: use compare-exchange operation to set KASAN page tag
It has been reported that the tag setting operation on newly-allocated
pages can cause the page flags to be corrupted when performed
concurrently with other flag updates as a result of the use of
non-atomic operations.
Fix the problem by using a compare-exchange loop to update the tag.
Marco Elver [Sat, 29 Jan 2022 21:41:11 +0000 (13:41 -0800)]
kasan: test: fix compatibility with FORTIFY_SOURCE
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform
dynamic checks using __builtin_object_size(ptr), which when failed will
panic the kernel.
Because the KASAN test deliberately performs out-of-bounds operations,
the kernel panics with FORTIFY_SOURCE, for example:
Fix it by also hiding `ptr` from the optimizer, which will ensure that
__builtin_object_size() does not return a valid size, preventing
fortified string functions from panicking.
Maor Gottlieb [Sat, 29 Jan 2022 21:41:07 +0000 (13:41 -0800)]
tools/testing/scatterlist: add missing defines
The cited commits replaced preemptible with pagefault_disabled and
flush_kernel_dcache_page with flush_dcache_page respectively, hence need
to update the corresponding defines in the test.
scatterlist.c: In function ‘sg_miter_stop’:
scatterlist.c:919:4: warning: implicit declaration of function ‘flush_dcache_page’ [-Wimplicit-function-declaration]
flush_dcache_page(miter->page);
^~~~~~~~~~~~~~~~~
In file included from linux/scatterlist.h:8:0,
from scatterlist.c:9:
scatterlist.c:922:18: warning: implicit declaration of function ‘pagefault_disabled’ [-Wimplicit-function-declaration]
WARN_ON_ONCE(!pagefault_disabled());
^
linux/mm.h:23:25: note: in definition of macro ‘WARN_ON_ONCE’
int __ret_warn_on = !!(condition); \
^~~~~~~~~
Link: https://lkml.kernel.org/r/20220118082105.1737320-1-maorg@nvidia.com Fixes: 723aca208516 ("mm/scatterlist: replace the !preemptible warning in sg_miter_stop()") Fixes: 0e84f5dbf8d6 ("scatterlist: replace flush_kernel_dcache_page with flush_dcache_page") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yang [Sat, 29 Jan 2022 21:41:04 +0000 (13:41 -0800)]
mm: page->mapping folio->mapping should have the same offset
As with the other members of folio, the offset of page->mapping and
folio->mapping must be the same. The compile-time check was
inadvertently removed during development. Add it back.
[willy@infradead.org: changelog redo]
Link: https://lkml.kernel.org/r/20220104011734.21714-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joao Martins [Sat, 29 Jan 2022 21:41:01 +0000 (13:41 -0800)]
memory-failure: fetch compound_head after pgmap_pfn_valid()
memory_failure_dev_pagemap() at the moment assumes base pages (e.g.
dax_lock_page()). For devmap with compound pages fetch the
compound_head in case a tail page memory failure is being handled.
Currently this is a nop, but in the advent of compound pages in
dev_pagemap it allows memory_failure_dev_pagemap() to keep working.
Without this fix memory-failure handling (i.e. MCEs on pmem) with
device-dax configured namespaces will regress (and crash).
Link: https://lkml.kernel.org/r/20211202204422.26777-2-joao.m.martins@oracle.com Reported-by: Jane Chu <jane.chu@oracle.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Sat, 29 Jan 2022 21:40:58 +0000 (13:40 -0800)]
ia64: make IA64_MCA_RECOVERY bool instead of tristate
In linux-next, IA64_MCA_RECOVERY uses the (new) function
make_task_dead(), which is not exported for use by modules. Instead of
exporting it for one user, convert IA64_MCA_RECOVERY to be a bool
Kconfig symbol.
In a config file from "kernel test robot <lkp@intel.com>" for a
different problem, this linker error was exposed when
CONFIG_IA64_MCA_RECOVERY=m.
Link: https://lkml.kernel.org/r/20220124213129.29306-1-rdunlap@infradead.org Fixes: 0e25498f8cd4 ("exit: Add and use make_task_dead.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tong Zhang [Sat, 29 Jan 2022 21:40:55 +0000 (13:40 -0800)]
binfmt_misc: fix crash when load/unload module
We should unregister the table upon module unload otherwise something
horrible will happen when we load binfmt_misc module again. Also note
that we should keep value returned by register_sysctl_mount_point() and
release it later, otherwise it will leak.
Also, per Christian's comment, to fully restore the old behavior that
won't break userspace the check(binfmt_misc_header) should be
eliminated.
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
binfmt_misc: Failed to create fs/binfmt_misc sysctl mount point
BUG: unable to handle page fault for address: fffffbfff8004802
Call Trace:
init_misc_binfmt+0x2d/0x1000 [binfmt_misc]
Link: https://lkml.kernel.org/r/20220124181812.1869535-2-ztong0001@gmail.com Fixes: 3ba442d5331f ("fs: move binfmt_misc sysctl to its own file") Signed-off-by: Tong Zhang <ztong0001@gmail.com> Co-developed-by: Christian Brauner<brauner@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Iurii Zaikin <yzaikin@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 29 Jan 2022 17:05:47 +0000 (19:05 +0200)]
Merge tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fixes from Bjorn Helgaas:
- Fix compilation warnings in new mt7621 driver (Sergio Paracuellos)
- Restore the sysfs "rom" file for VGA shadow ROMs, which was broken
when converting "rom" to be a static attribute (Bjorn Helgaas)
* tag 'pci-v5.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/sysfs: Find shadow ROM before static attribute initialization
PCI: mt7621: Remove unused function pcie_rmw()
PCI: mt7621: Drop of_match_ptr() to avoid unused variable
Linus Torvalds [Sat, 29 Jan 2022 13:45:33 +0000 (15:45 +0200)]
Merge tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"Two fixes for the gpio-simulator:
- fix a bug with hogs not being set-up in gpio-sim when user-space
sets the chip label to an empty string
- include the gpio-sim documentation in the index"
* tag 'gpio-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: sim: add doc file to index file
gpio: sim: check the label length when setting up device properties
Linus Torvalds [Sat, 29 Jan 2022 13:34:04 +0000 (15:34 +0200)]
Merge tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are two small char/misc driver fixes for 5.17-rc2 that fix some
reported issues. They are:
- fix up a merge issue in the at25.c driver that ended up dropping
some lines in the driver. The removed lines ended being needed, so
this restores it and the driver works again.
- counter core fix where the wrong error was being returned, NULL
should be the correct error for when memory is gone here, like the
kmalloc() core does.
Both of these have been in linux-next this week with no reported
issues"
* tag 'char-misc-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
counter: fix an IS_ERR() vs NULL bug
eeprom: at25: Restore missing allocation
Linus Torvalds [Sat, 29 Jan 2022 13:23:13 +0000 (15:23 +0200)]
Merge tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small bug fixes and reverts for reported problems with
the tty core and drivers. They include:
- revert the fifo use for the 8250 console mode. It caused too many
regressions and problems, and had a bug in it as well. This is
being reworked and should show up in a later -rc1 release, but it's
not ready for 5.17
- rpmsg tty race fix
- restore the cyclades.h uapi header file. Turns out a compiler test
suite used it for some unknown reason. Bring it back just for the
parts that are used by the builder test so they continue to build.
No functionality is restored as no one actually has this hardware
anymore, nor is it really tested.
- stm32 driver fixes
- n_gsm flow control fixes
- pl011 driver fix
- rs485 initialization fix
All of these have been in linux-next this week with no reported
problems"
* tag 'tty-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
kbuild: remove include/linux/cyclades.h from header file check
serial: core: Initialize rs485 RTS polarity already on probe
serial: pl011: Fix incorrect rs485 RTS polarity on set_mctrl
serial: stm32: fix software flow control transfer
serial: stm32: prevent TDR register overwrite when sending x_char
tty: n_gsm: fix SW flow control encoding/handling
serial: 8250: of: Fix mapped region size when using reg-offset property
tty: rpmsg: Fix race condition releasing tty port
tty: Partially revert the removal of the Cyclades public API
tty: Add support for Brainboxes UC cards.
Revert "tty: serial: Use fifo in 8250 console driver"
Linus Torvalds [Sat, 29 Jan 2022 13:17:20 +0000 (15:17 +0200)]
Merge tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for 5.17-rc2 that resolve a
number of reported problems. These include:
- typec driver fixes
- xhci platform driver fixes for suspending
- ulpi core fix
- role.h build fix
- new device ids
- syzbot-reported bugfixes
- gadget driver fixes
- dwc3 driver fixes
- other small fixes
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: cdnsp: Fix segmentation fault in cdns_lost_power function
usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
usb: gadget: at91_udc: fix incorrect print type
usb: dwc3: xilinx: Fix error handling when getting USB3 PHY
usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
usb: xhci-plat: fix crash when suspend if remote wake enable
usb: common: ulpi: Fix crash in ulpi_match()
usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
ucsi_ccg: Check DEV_INT bit only when starting CCG4
USB: core: Fix hang in usb_kill_urb by adding memory barriers
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
usb: typec: tcpm: Do not disconnect when receiving VSAFE0V
usb: typec: tcpm: Do not disconnect while receiving VBUS off
usb: typec: Don't try to register component master without components
usb: typec: Only attempt to link USB ports if there is fwnode
usb: typec: tcpci: don't touch CC line if it's Vconn source
usb: roles: fix include/linux/usb/role.h compile issue
Linus Torvalds [Sat, 29 Jan 2022 13:01:08 +0000 (15:01 +0200)]
Merge tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request
- add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs (Wu
Zheng)
- remove the unneeded ret variable in nvmf_dev_show (Changcheng
Deng)
- Fix for a hang regression introduced with a patch in the merge
window, where low queue depth devices would not always get woken
correctly (Laibin)
- Small series fixing an IO accounting issue with bio backed dm devices
(Mike, Yu)
* tag 'block-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
dm: properly fix redundant bio-based IO accounting
dm: revert partial fix for redundant bio-based IO accounting
block: add bio_start_io_acct_time() to control start_time
blk-mq: Fix wrong wakeup batch configuration which will cause hang
nvme-fabrics: remove the unneeded ret variable in nvmf_dev_show
nvme-pci: add the IGNORE_DEV_SUBNQN quirk for Intel P4500/P4600 SSDs
blk-mq: fix missing blk_account_io_done() in error path
block: fix memory leak in disk_register_independent_access_ranges
Linus Torvalds [Sat, 29 Jan 2022 12:53:07 +0000 (14:53 +0200)]
Merge tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Just two small fixes this time:
- Fix a bug that can lead to node registration taking 1 second, when
it should finish much quicker (Dylan)
- Remove an unused argument from a function (Usama)"
* tag 'io_uring-5.17-2022-01-28' of git://git.kernel.dk/linux-block:
io_uring: remove unused argument from io_rsrc_node_alloc
io_uring: fix bug in slow unregistering of nodes
Linus Torvalds [Sat, 29 Jan 2022 12:46:19 +0000 (14:46 +0200)]
Merge tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix VM debug warnings on boot triggered via __set_fixmap().
- Fix a debug warning in the 64-bit Book3S PMU handling code.
- Fix nested guest HFSCR handling with multiple vCPUs on Power9 or
later.
- Fix decrementer storm caused by a recent change, seen with some
configs.
Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
Fabiano Rosas, Maxime Bizon, Nicholas Piggin, and Sachin Sant.
* tag 'powerpc-5.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/interrupt: Fix decrementer storm
KVM: PPC: Book3S HV Nested: Fix nested HFSCR being clobbered with multiple vCPUs
powerpc/perf: Fix power_pmu_disable to call clear_pmi_irq_pending only if PMI is pending
powerpc/fixmap: Fix VM debug warning on unmap