Christophe Leroy [Fri, 12 Mar 2021 12:50:41 +0000 (12:50 +0000)]
powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
In order to get more control in exception prolog, dismantle
all non standard exception macros, finishing with EXC_XFER_STD
and EXC_XFER_LITE and EXC_XFER_TEMPLATE.
Also remove transfer_to_handler_full and ret_from_except and
ret_from_except_full as they are not used anymore.
Last parameter of EXCEPTION() is now ignored, will be removed
in a later patch to avoid too much churn.
Christophe Leroy [Fri, 12 Mar 2021 12:50:40 +0000 (12:50 +0000)]
powerpc/32: Only restore non volatile registers when required
Until now, non volatile registers were restored everytime they
were saved, ie using EXC_XFER_STD meant saving and restoring
them while EXC_XFER_LITE meant neither saving not restoring them.
Now that they are always saved, EXC_XFER_STD means to restore
them and EXC_XFER_LITE means to not restore them.
Most of the users of EXC_XFER_STD only need to retrieve the
non volatile registers. For them there is no need to restore
the non volatile registers as they have not been modified.
Only very few exceptions require non volatile registers restore.
Opencode the few places which require saving of non volatile
registers.
Christophe Leroy [Fri, 12 Mar 2021 12:50:39 +0000 (12:50 +0000)]
powerpc/32: Add a prepare_transfer_to_handler macro for exception prologs
In order to increase flexibility, add a macro that will for now
call transfer_to_handler.
As transfer_to_handler doesn't do the actual transfer anymore,
also name it prepare_transfer_to_handler. The following patches
will progressively remove the use of transfer_to_handler label.
Christophe Leroy [Fri, 12 Mar 2021 12:50:35 +0000 (12:50 +0000)]
powerpc/32: Don't save thread.regs on interrupt entry
Since commit 06d67d54741a ("powerpc: make process.c suitable for both
32-bit and 64-bit"), thread.regs is set on task creation, no need to
set it again and again at each interrupt entry as it never change.
Christophe Leroy [Fri, 12 Mar 2021 12:50:33 +0000 (12:50 +0000)]
powerpc/32: Always save non volatile registers on exception entry
In preparation of handling exception entry and exit in C,
in order to simplify the handling, always save non volatile registers
when entering an exception.
Christophe Leroy [Fri, 12 Mar 2021 12:50:32 +0000 (12:50 +0000)]
powerpc/32: Perform normal function call in exception entry
Now that the MMU is re-enabled before calling the transfer function,
we don't need anymore that hack with the address of the handler and
the return function sitting just after the 'bl' to the transfer
fonction, that function is retrieving via a read relative to 'lr'.
Do a regular call to the transfer function, then to the handler,
then branch to the return function.
Christophe Leroy [Fri, 12 Mar 2021 12:50:29 +0000 (12:50 +0000)]
powerpc/32: Move exception prolog code into .text once MMU is back on
The space in the head section is rather constrained by the fact that
exception vectors are spread every 0x100 bytes and sometimes we
need to have "out of line" code because it doesn't fit.
Now that we are enabling MMU early in the prolog, take that opportunity
to jump somewhere else in the .text section where we don't have any
space constraint.
Christophe Leroy [Fri, 12 Mar 2021 12:50:25 +0000 (12:50 +0000)]
powerpc/32: Statically initialise first emergency context
The check of the emergency context initialisation in
vmap_stack_overflow is buggy for the SMP case, as it
compares r1 with 0 while in the SMP case r1 is offseted
by the CPU id.
Instead of fixing it, just perform static initialisation
of the first emergency context.
Christophe Leroy [Fri, 12 Mar 2021 12:50:23 +0000 (12:50 +0000)]
powerpc/32: Tag DAR in EXCEPTION_PROLOG_2 for the 8xx
8xx requires to tag the DAR with a magic value in order to
fixup DAR on faults generated by 'dcbX', as the 8xx
forgets to update the DAR for those faults.
Do the tagging as early as possible, that is before enabling MMU.
Christophe Leroy [Fri, 12 Mar 2021 12:50:21 +0000 (12:50 +0000)]
powerpc/32: Remove ksp_limit
ksp_limit is there to help detect stack overflows.
That is specific to ppc32 as it was removed from ppc64 in
commit cbc9565ee826 ("powerpc: Remove ksp_limit on ppc64").
There are other means for detecting stack overflows.
As ppc64 has proven to not need it, ppc32 should be able to do
without it too.
Christophe Leroy [Fri, 12 Mar 2021 12:50:16 +0000 (12:50 +0000)]
powerpc/40x: Prepare normal exception handler for enabling MMU early
Ensure normal exception handler are able to manage stuff with
MMU enabled. For that we use CONFIG_VMAP_STACK related code
allthough there is no intention to really activate CONFIG_VMAP_STACK
on powerpc 40x for the moment.
40x uses SPRN_DEAR instead of SPRN_DAR and SPRN_ESR instead of
SPRN_DSISR. Take it into account in common macros.
40x MSR value doesn't fit on 15 bits, use LOAD_REG_IMMEDIATE() in
common macros that will be used also with 40x.
Christophe Leroy [Fri, 12 Mar 2021 12:50:11 +0000 (12:50 +0000)]
powerpc/40x: Don't use SPRN_SPRG_SCRATCH0/1 in TLB miss handlers
SPRN_SPRG_SCRATCH5 is used to save SPRN_PID.
SPRN_SPRG_SCRATCH6 is already available.
SPRN_PID is only 8 bits. We have r12 that contains CR.
We only need to preserve CR0, so we have space available in r12
to save PID.
Keep PID in r12 and free up SPRN_SPRG_SCRATCH5.
Then In TLB miss handlers, instead of using SPRN_SPRG_SCRATCH0 and
SPRN_SPRG_SCRATCH1, use SPRN_SPRG_SCRATCH5 and SPRN_SPRG_SCRATCH6
to avoid future conflicts with normal exception prologs.
Christophe Leroy [Fri, 12 Mar 2021 12:50:10 +0000 (12:50 +0000)]
powerpc/traps: Declare unrecoverable_exception() as __noreturn
unrecoverable_exception() is never expected to return, most callers
have an infiniteloop in case it returns.
Ensure it really never returns by terminating it with a BUG(), and
declare it __no_return.
It always GCC to really simplify functions calling it. In the exemple
below, it avoids the stack frame in the likely fast path and avoids
code duplication for the exit.
Ravi Bangoria [Thu, 11 Mar 2021 09:15:38 +0000 (14:45 +0530)]
powerpc/uprobes: Validation for prefixed instruction
As per ISA 3.1, prefixed instruction should not cross 64-byte
boundary. So don't allow Uprobe on such prefixed instruction.
There are two ways probed instruction is changed in mapped pages.
First, when Uprobe is activated, it searches for all the relevant
pages and replace instruction in them. In this case, if that probe
is on the 64-byte unaligned prefixed instruction, error out
directly. Second, when Uprobe is already active and user maps a
relevant page via mmap(), instruction is replaced via mmap() code
path. But because Uprobe is invalid, entire mmap() operation can
not be stopped. In this case just print an error and continue.
Usually sigset_t is exactly 8B which is a "trivial" size and does not
warrant using __copy_from_user(). Use __get_user() directly in
anticipation of future work to remove the trivial size optimizations
from __copy_from_user().
The ppc32 implementation of get_sigset_t() previously called
copy_from_user() which, unlike __copy_from_user(), calls access_ok().
Replacing this w/ __get_user() (no access_ok()) is fine here since both
callsites in signal_32.c are preceded by an earlier access_ok().
Daniel Axtens [Sat, 27 Feb 2021 01:12:58 +0000 (19:12 -0600)]
powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.
Co-developed-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210227011259.11992-10-cmr@codefail.de
Daniel Axtens [Sat, 27 Feb 2021 01:12:57 +0000 (19:12 -0600)]
powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.
There is no 'unsafe' version of copy_siginfo_to_user, so move it
slightly to allow for a "longer" uaccess block.
Co-developed-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Christopher M. Riedl <cmr@codefail.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210227011259.11992-9-cmr@codefail.de
Previously restore_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.
Rewrite restore_sigcontext() to assume that a userspace read access
window is open by replacing all uaccess functions with their 'unsafe'
versions. Modify the callers to first open, call
unsafe_restore_sigcontext(), and then close the uaccess window.
Previously setup_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.
Rewrite setup_sigcontext() to assume that a userspace write access window
is open by replacing all uaccess functions with their 'unsafe' versions.
Modify the callers to first open, call unsafe_setup_sigcontext() and
then close the uaccess window.
powerpc/signal64: Remove TM ifdefery in middle of if/else block
Both rt_sigreturn() and handle_rt_signal_64() contain TM-related ifdefs
which break-up an if/else block. Provide stubs for the ifdef-guarded TM
functions and remove the need for an ifdef in rt_sigreturn().
Rework the remaining TM ifdef in handle_rt_signal64() similar to
commit f1cf4f93de2f ("powerpc/signal32: Remove ifdefery in middle of if/else").
Unlike in the commit for ppc32, the ifdef can't be removed entirely
since uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM.
powerpc: Reference parameter in MSR_TM_ACTIVE() macro
Unlike the other MSR_TM_* macros, MSR_TM_ACTIVE does not reference or
use its parameter unless CONFIG_PPC_TRANSACTIONAL_MEM is defined. This
causes an 'unused variable' compile warning unless the variable is also
guarded with CONFIG_PPC_TRANSACTIONAL_MEM.
Reference but do nothing with the argument in the macro to avoid a
potential compile warning.
powerpc/signal64: Remove non-inline calls from setup_sigcontext()
The majority of setup_sigcontext() can be refactored to execute in an
"unsafe" context assuming an open uaccess window except for some
non-inline function calls. Move these out into a separate
prepare_setup_sigcontext() function which must be called first and
before opening up a uaccess window. Non-inline function calls should be
avoided during a uaccess window for a few reasons:
- KUAP should be enabled for as much kernel code as possible.
Opening a uaccess window disables KUAP which means any code
executed during this time contributes to a potential attack
surface.
- Non-inline functions default to traceable which means they are
instrumented for ftrace. This adds more code which could run
with KUAP disabled.
- Powerpc does not currently support the objtool UACCESS checks.
All code running with uaccess must be audited manually which
means: less code -> less work -> fewer problems (in theory).
A follow-up commit converts setup_sigcontext() to be "unsafe".
Davidlohr Bueso [Tue, 9 Mar 2021 01:59:50 +0000 (17:59 -0800)]
powerpc/qspinlock: Use generic smp_cond_load_relaxed
49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed()) added
busy-waiting pausing with a preferred SMT priority pattern, lowering
the priority (reducing decode cycles) during the whole loop slowpath.
However, data shows that while this pattern works well with simple
spinlocks, queued spinlocks benefit more being kept in medium priority,
with a cpu_relax() instead, being a low+medium combo on powerpc.
Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with
2 sockets and 8 threads per core.
1. locktorture.
This is data for the lowest and most artificial/pathological level,
with increasing thread counts pounding on the lock. Metrics are total
ops/minute. Despite some small hits in the 4-8 range, scenarios are
either neutral or favorable to this patch.
Here a client will do one-way throughput tests to a localhost server, with
increasing message sizes, dealing with the sk_lock. This patch shows to put
the performance of the qspinlock back to par with that of the simple lock:
Davidlohr Bueso [Tue, 9 Mar 2021 01:59:49 +0000 (17:59 -0800)]
powerpc/spinlock: Unserialize spin_is_locked
c6f5d02b6a0f (locking/spinlocks/arm64: Remove smp_mb() from
arch_spin_is_locked()) made it pretty official that the call
semantics do not imply any sort of barriers, and any user that
gets creative must explicitly do any serialization.
This creativity, however, is nowadays pretty limited:
1. spin_unlock_wait() has been removed from the kernel in favor
of a lock/unlock combo. Furthermore, queued spinlocks have now
for a number of years no longer relied on _Q_LOCKED_VAL for the
call, but any non-zero value to indicate a locked state. There
were cases where the delayed locked store could lead to breaking
mutual exclusion with crossed locking; such as with sysv ipc and
netfilter being the most extreme.
2. The auditing Andrea did in verified that remaining spin_is_locked()
no longer rely on such semantics. Most callers just use it to assert
a lock is taken, in a debug nature. The only user that gets cute is
NOLOCK qdisc, as of:
96009c7d500e (sched: replace __QDISC_STATE_RUNNING bit with a spin lock)
... which ironically went in the next day after c6f5d02b6a0f. This
change replaces test_bit() with spin_is_locked() to know whether
to take the busylock heuristic to reduce contention on the main
qdisc lock. So any races against spin_is_locked() for archs that
use LL/SC for spin_lock() will be benign and not break any mutual
exclusion; furthermore, both the seqlock and busylock have the same
scope.
Christophe Leroy [Wed, 10 Mar 2021 17:57:02 +0000 (17:57 +0000)]
powerpc/uaccess: Move copy_mc_xxx() functions down
copy_mc_xxx() functions are in the middle of raw_copy functions.
For clarity, move them out of the raw_copy functions block.
They are using access_ok, so they need to be after the general
functions in order to eventually allow the inclusion of
asm-generic/uaccess.h in some future.
Laurent Dufour [Fri, 5 Mar 2021 12:55:54 +0000 (13:55 +0100)]
powerpc/pseries: export LPAR security flavor in lparcfg
This is helpful to read the security flavor from inside the LPAR.
In /sys/kernel/debug/powerpc/security_features it can be seen if
mitigations are on or off but not the level set through the ASMI menu.
Furthermore, reporting it through /proc/powerpc/lparcfg allows an easy
processing by the lparstat command [1].
Add architecture specific implementation details for KFENCE and enable
KFENCE for the ppc32 architecture. In particular, this implements the
required interface in <asm/kfence.h>.
KFENCE requires that attributes for pages from its memory pool can
individually be set. Therefore, force the Read/Write linear map to be
mapped at page granularity.
Zhang Yunkai [Thu, 4 Mar 2021 04:49:43 +0000 (20:49 -0800)]
powerpc: Remove duplicate includes
asm/tm.h included in traps.c is duplicated. It is also included on
the 62nd line.
asm/udbg.h included in setup-common.c is duplicated. It is also
included on the 61st line.
asm/bug.h included in arch/powerpc/include/asm/book3s/64/mmu-hash.h
is duplicated. It is also included on the 12th line.
asm/tlbflush.h included in arch/powerpc/include/asm/pgtable.h is
duplicated. It is also included on the 11th line.
asm/page.h included in arch/powerpc/include/asm/thread_info.h is
duplicated. It is also included on the 13th line.
Signed-off-by: Zhang Yunkai <zhang.yunkai@zte.com.cn>
[mpe: Squash together from multiple commits] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If identical_pvr_fixup() is not inlined, there are two modpost warnings:
WARNING: modpost: vmlinux.o(.text+0x54e8): Section mismatch in reference
from the function identical_pvr_fixup() to the function
.init.text:of_get_flat_dt_prop()
The function identical_pvr_fixup() references
the function __init of_get_flat_dt_prop().
This is often because identical_pvr_fixup lacks a __init
annotation or the annotation of of_get_flat_dt_prop is wrong.
WARNING: modpost: vmlinux.o(.text+0x551c): Section mismatch in reference
from the function identical_pvr_fixup() to the function
.init.text:identify_cpu()
The function identical_pvr_fixup() references
the function __init identify_cpu().
This is often because identical_pvr_fixup lacks a __init
annotation or the annotation of identify_cpu is wrong.
identical_pvr_fixup() calls two functions marked as __init and is only
called by a function marked as __init so it should be marked as __init
as well. At the same time, remove the inline keywork as it is not
necessary to inline this function. The compiler is still free to do so
if it feels it is worthwhile since commit 889b3c1245de ("compiler:
remove CONFIG_OPTIMIZE_INLINING entirely").
powerpc/fadump: Mark fadump_calculate_reserve_size as __init
If fadump_calculate_reserve_size() is not inlined, there is a modpost
warning:
WARNING: modpost: vmlinux.o(.text+0x5196c): Section mismatch in
reference from the function fadump_calculate_reserve_size() to the
function .init.text:parse_crashkernel()
The function fadump_calculate_reserve_size() references
the function __init parse_crashkernel().
This is often because fadump_calculate_reserve_size lacks a __init
annotation or the annotation of parse_crashkernel is wrong.
fadump_calculate_reserve_size() calls parse_crashkernel(), which is
marked as __init and fadump_calculate_reserve_size() is called from
within fadump_reserve_mem(), which is also marked as __init.
Mark fadump_calculate_reserve_size() as __init to fix the section
mismatch. Additionally, remove the inline keyword as it is not necessary
to inline this function; the compiler is still free to do so if it feels
it is worthwhile since commit 889b3c1245de ("compiler: remove
CONFIG_OPTIMIZE_INLINING entirely").
Russell Currey [Tue, 23 Feb 2021 07:02:27 +0000 (17:02 +1000)]
selftests/powerpc: Fix L1D flushing tests for Power10
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Commit 407d418f2fd4c20a ("powerpc/chrp: Move PHB discovery") moved the
sole call to hydra_init() to the source file where it is defined, so it
can be made static.
powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used
The mutex linear_mapping_mutex is defined at the of the file while its
only two user are within the CONFIG_MEMORY_HOTPLUG block.
A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because
its mutex implementation is smart enough to realize that it is unused.
Move the definition of linear_mapping_mutex to ifdef block where it is
used.
Alexey Dobriyan [Sun, 14 Mar 2021 20:51:14 +0000 (23:51 +0300)]
prctl: fix PR_SET_MM_AUXV kernel stack leak
Doing a
prctl(PR_SET_MM, PR_SET_MM_AUXV, addr, 1);
will copy 1 byte from userspace to (quite big) on-stack array
and then stash everything to mm->saved_auxv.
AT_NULL terminator will be inserted at the very end.
/proc/*/auxv handler will find that AT_NULL terminator
and copy original stack contents to userspace.
Linus Torvalds [Sun, 14 Mar 2021 20:33:33 +0000 (13:33 -0700)]
Merge tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of irqchip updates:
- Make the GENERIC_IRQ_MULTI_HANDLER configuration correct
- Add a missing DT compatible string for the Ingenic driver
- Remove the pointless debugfs_file pointer from struct irqdomain"
* tag 'irq-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ingenic: Add support for the JZ4760
dt-bindings/irq: Add compatible string for the JZ4760B
irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER
ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly
irqdomain: Remove debugfs_file from struct irq_domain
Linus Torvalds [Sun, 14 Mar 2021 20:29:38 +0000 (13:29 -0700)]
Merge tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix in for hrtimers to prevent an interrupt storm caused by
the lack of reevaluation of the timers which expire in softirq context
under certain circumstances, e.g. when the clock was set"
* tag 'timers-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()
Linus Torvalds [Sun, 14 Mar 2021 20:15:55 +0000 (13:15 -0700)]
Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Thomas Gleixner:
"A single objtool fix to handle the PUSHF/POPF validation correctly for
the paravirt changes which modified arch_local_irq_restore not to use
popf"
* tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool,x86: Fix uaccess PUSHF/POPF validation
Linus Torvalds [Sun, 14 Mar 2021 20:03:21 +0000 (13:03 -0700)]
Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A couple of locking fixes:
- A fix for the static_call mechanism so it handles unaligned
addresses correctly.
- Make u64_stats_init() a macro so every instance gets a seperate
lockdep key.
- Make seqcount_latch_init() a macro as well to preserve the static
variable which is used for the lockdep key"
* tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
seqlock,lockdep: Fix seqcount_latch_init()
u64_stats,lockdep: Fix u64_stats_init() vs lockdep
static_call: Fix the module key fixup
Linus Torvalds [Sun, 14 Mar 2021 19:57:17 +0000 (12:57 -0700)]
Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Make sure PMU internal buffers are flushed for per-CPU events too and
properly handle PID/TID for large PEBS.
- Handle the case properly when there's no PMU and therefore return an
empty list of perf MSRs for VMX to switch instead of reading random
garbage from the stack.
* tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case
perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR
perf/core: Flush PMU internal buffers for per-CPU events
Linus Torvalds [Sun, 14 Mar 2021 19:54:56 +0000 (12:54 -0700)]
Merge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fix from Ard Biesheuvel via Borislav Petkov:
"Fix an oversight in the handling of EFI_RT_PROPERTIES_TABLE, which was
added v5.10, but failed to take the SetVirtualAddressMap() RT service
into account"
* tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table
Linus Torvalds [Sun, 14 Mar 2021 19:48:10 +0000 (12:48 -0700)]
Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- A couple of SEV-ES fixes and robustifications: verify usermode stack
pointer in NMI is not coming from the syscall gap, correctly track
IRQ states in the #VC handler and access user insn bytes atomically
in same handler as latter cannot sleep.
- Balance 32-bit fast syscall exit path to do the proper work on exit
and thus not confuse audit and ptrace frameworks.
- Two fixes for the ORC unwinder going "off the rails" into KASAN
redzones and when ORC data is missing.
* tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev-es: Use __copy_from_user_inatomic()
x86/sev-es: Correctly track IRQ states in runtime #VC handler
x86/sev-es: Check regs->sp is trusted before adjusting #VC IST stack
x86/sev-es: Introduce ip_within_syscall_gap() helper
x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls
x86/unwind/orc: Silence warnings caused by missing ORC data
x86/unwind/orc: Disable KASAN checking in the ORC unwinder, part 2
Linus Torvalds [Sun, 14 Mar 2021 19:37:43 +0000 (12:37 -0700)]
Merge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.12:
- Fix wrong instruction encoding for lis in ppc_function_entry(),
which could potentially lead to missed kprobes.
- Fix SET_FULL_REGS on 32-bit and 64e, which prevented ptrace of
non-volatile GPRs immediately after exec.
- Clean up a missed SRR specifier in the recent interrupt rework.
- Don't treat unrecoverable_exception() as an interrupt handler, it's
called from other handlers so shouldn't do the interrupt entry/exit
accounting itself.
- Fix build errors caused by missing declarations for
[en/dis]able_kernel_vsx().
Thanks to Christophe Leroy, Daniel Axtens, Geert Uytterhoeven, Jiri
Olsa, Naveen N. Rao, and Nicholas Piggin"
* tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/traps: unrecoverable_exception() is not an interrupt handler
powerpc: Fix missing declaration of [en/dis]able_kernel_vsx()
powerpc/64s/exception: Clean up a missed SRR specifier
powerpc: Fix inverted SET_FULL_REGS bitop
powerpc/64s: Use symbolic macros for function entry encoding
powerpc/64s: Fix instruction encoding for lis in ppc_function_entry()
Linus Torvalds [Sun, 14 Mar 2021 19:35:02 +0000 (12:35 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"More fixes for ARM and x86"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Advancing the timer expiration on guest initiated write
KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
kvm: x86: annotate RCU pointers
KVM: arm64: Fix exclusive limit for IPA size
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
KVM: arm64: Don't use cbz/adr with external symbols
KVM: arm64: Fix range alignment when walking page tables
KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
KVM: arm64: Fix nVHE hyp panic host context restore
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: nvhe: Save the SPE context early
kvm: x86: use NULL instead of using plain integer as pointer
KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
Linus Torvalds [Sun, 14 Mar 2021 19:23:34 +0000 (12:23 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"28 patches.
Subsystems affected by this series: mm (memblock, pagealloc, hugetlb,
highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and
zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and
ia64"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits)
zram: fix broken page writeback
zram: fix return value on writeback_store
mm/memcg: set memcg when splitting page
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
mm/userfaultfd: fix memory corruption due to writeprotect
kasan: fix KASAN_STACK dependency for HW_TAGS
kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
mm/madvise: replace ptrace attach requirement for process_madvise
include/linux/sched/mm.h: use rcu_dereference in in_vfork()
kfence: fix reports if constant function prefixes exist
kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations
kfence: fix printk format for ptrdiff_t
linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
MAINTAINERS: exclude uapi directories in API/ABI section
binfmt_misc: fix possible deadlock in bm_register_write
mm/highmem.c: fix zero_user_segments() with start > end
hugetlb: do early cow when page pinned on src mm
mm: use is_cow_mapping() across tree where proper
...
Thomas Gleixner [Sun, 14 Mar 2021 15:34:35 +0000 (16:34 +0100)]
Merge tag 'irqchip-fixes-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip fixes from Marc Zyngier:
- More compatible strings for the Ingenic irqchip (introducing the
JZ4760B SoC)
- Select GENERIC_IRQ_MULTI_HANDLER on the ARM ep93xx platform
- Drop all GENERIC_IRQ_MULTI_HANDLER selections from the irqchip
Kconfig, now relying on the architecture to get it right
- Drop the debugfs_file field from struct irq_domain, now that
debugfs can track things on its own
Linus Torvalds [Sat, 13 Mar 2021 20:38:44 +0000 (12:38 -0800)]
Merge tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small misc/char driver fixes to resolve some reported
problems:
- habanalabs driver fixes
- Acrn build fixes (reported many times)
- pvpanic module table export fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc/pvpanic: Export module FDT device table
misc: fastrpc: restrict user apps from sending kernel RPC messages
virt: acrn: Correct type casting of argument of copy_from_user()
virt: acrn: Use EPOLLIN instead of POLLIN
virt: acrn: Use vfs_poll() instead of f_op->poll()
virt: acrn: Make remove_cpu sysfs invisible with !CONFIG_HOTPLUG_CPU
cpu/hotplug: Fix build error of using {add,remove}_cpu() with !CONFIG_SMP
habanalabs: fix debugfs address translation
habanalabs: Disable file operations after device is removed
habanalabs: Call put_pid() when releasing control device
drivers: habanalabs: remove unused dentry pointer for debugfs files
habanalabs: mark hl_eq_inc_ptr() as static
Linus Torvalds [Sat, 13 Mar 2021 20:36:53 +0000 (12:36 -0800)]
Merge tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for reported problems. They
include:
- wfx header file cleanup patch reverted as it could cause problems
- comedi driver endian fixes
- buffer overflow problems for staging wifi drivers
- build dependency issue for rtl8192e driver
All have been in linux-next for a while with no reported problems"
* tag 'staging-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
Revert "staging: wfx: remove unused included header files"
staging: rtl8188eu: prevent ->ssid overflow in rtw_wx_set_scan()
staging: rtl8188eu: fix potential memory corruption in rtw_check_beacon_data()
staging: rtl8192u: fix ->ssid overflow in r8192_wx_set_scan()
staging: comedi: pcl726: Use 16-bit 0 for interrupt data
staging: comedi: ni_65xx: Use 16-bit 0 for interrupt data
staging: comedi: ni_6527: Use 16-bit 0 for interrupt data
staging: comedi: comedi_parport: Use 16-bit 0 for interrupt data
staging: comedi: amplc_pc236_common: Use 16-bit 0 for interrupt data
staging: comedi: pcl818: Fix endian problem for AI command data
staging: comedi: pcl711: Fix endian problem for AI command data
staging: comedi: me4000: Fix endian problem for AI command data
staging: comedi: dmm32at: Fix endian problem for AI command data
staging: comedi: das800: Fix endian problem for AI command data
staging: comedi: das6402: Fix endian problem for AI command data
staging: comedi: adv_pci1710: Fix endian problem for AI command data
staging: comedi: addi_apci_1500: Fix endian problem for command sample
staging: comedi: addi_apci_1032: Fix endian problem for COS sample
staging: ks7010: prevent buffer overflow in ks_wlan_set_scan()
staging: rtl8712: Fix possible buffer overflow in r8712_sitesurvey_cmd
...
Linus Torvalds [Sat, 13 Mar 2021 20:34:29 +0000 (12:34 -0800)]
Merge tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty and serial driver fixes to resolve some
reported problems:
- led tty trigger fixes based on review and were acked by the led
maintainer
- revert a max310x serial driver patch as it was causing problems
- revert a pty change as it was also causing problems
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "drivers:tty:pty: Fix a race causing data loss on close"
Revert "serial: max310x: rework RX interrupt handling"
leds: trigger/tty: Use led_set_brightness_sync() from workqueue
leds: trigger: Fix error path to not unlock the unlocked mutex
Linus Torvalds [Sat, 13 Mar 2021 20:32:57 +0000 (12:32 -0800)]
Merge tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a small number of USB fixes for 5.12-rc3 to resolve a bunch
of reported issues:
- usbip fixups for issues found by syzbot
- xhci driver fixes and quirk additions
- gadget driver fixes
- dwc3 QCOM driver fix
- usb-serial new ids and fixes
- usblp fix for a long-time issue
- cdc-acm quirk addition
- other tiny fixes for reported problems
All of these have been in linux-next for a while with no reported
issues"
* tag 'usb-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (25 commits)
xhci: Fix repeated xhci wake after suspend due to uncleared internal wake state
usb: xhci: Fix ASMedia ASM1042A and ASM3242 DMA addressing
xhci: Improve detection of device initiated wake signal.
usb: xhci: do not perform Soft Retry for some xHCI hosts
usbip: fix vudc usbip_sockfd_store races leading to gpf
usbip: fix vhci_hcd attach_store() races leading to gpf
usbip: fix stub_dev usbip_sockfd_store() races leading to gpf
usbip: fix vudc to check for stream socket
usbip: fix vhci_hcd to check for stream socket
usbip: fix stub_dev to check for stream socket
usb: dwc3: qcom: Add missing DWC3 OF node refcount decrement
USB: usblp: fix a hang in poll() if disconnected
USB: gadget: udc: s3c2410_udc: fix return value check in s3c2410_udc_probe()
usb: renesas_usbhs: Clear PIPECFG for re-enabling pipe with other EPNUM
usb: dwc3: qcom: Honor wakeup enabled/disabled state
usb: gadget: f_uac1: stop playback on function disable
usb: gadget: f_uac2: always increase endpoint max_packet_size by one audio slot
USB: gadget: u_ether: Fix a configfs return code
usb: dwc3: qcom: add ACPI device id for sc8180x
Goodix Fingerprint device is not a modem
...
Linus Torvalds [Sat, 13 Mar 2021 20:26:22 +0000 (12:26 -0800)]
Merge tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
"Fix an urgent regression introduced by commit baa2c7c97153 ("block:
set .bi_max_vecs as actual allocated vector number"), which could
cause unexpected hung since linux 5.12-rc1.
Resolve it by avoiding using bio->bi_max_vecs completely"
* tag 'erofs-for-5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: fix bio->bi_max_vecs behavior change
Linus Torvalds [Sat, 13 Mar 2021 20:18:59 +0000 (12:18 -0800)]
Merge tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- avoid 'make image_name' invoking syncconfig
- fix a couple of bugs in scripts/dummy-tools
- fix LLD_VENDOR and locale issues in scripts/ld-version.sh
- rebuild GCC plugins when the compiler is upgraded
- allow LTO to be enabled with KASAN_HW_TAGS
- allow LTO to be enabled without LLVM=1
* tag 'kbuild-fixes-v5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: fix ld-version.sh to not be affected by locale
kbuild: remove meaningless parameter to $(call if_changed_rule,dtc)
kbuild: remove LLVM=1 test from HAS_LTO_CLANG
kbuild: remove unneeded -O option to dtc
kbuild: dummy-tools: adjust to scripts/cc-version.sh
kbuild: Allow LTO to be selected with KASAN_HW_TAGS
kbuild: dummy-tools: support MPROFILE_KERNEL checks for ppc
kbuild: rebuild GCC plugins when the compiler is upgraded
kbuild: Fix ld-version.sh script if LLD was built with LLD_VENDOR
kbuild: dummy-tools: fix inverted tests for gcc
kbuild: add image_name to no-sync-config-targets
Minchan Kim [Sat, 13 Mar 2021 05:08:41 +0000 (21:08 -0800)]
zram: fix broken page writeback
commit 0d8359620d9b ("zram: support page writeback") introduced two
problems. It overwrites writeback_store's return value as kstrtol's
return value, which makes return value zero so user could see zero as
return value of write syscall even though it wrote data successfully.
It also breaks index value in the loop in that it doesn't increase the
index any longer. It means it can write only first starting block index
so user couldn't write all idle pages in the zram so lose memory saving
chance.
This patch fixes those issues.
Link: https://lkml.kernel.org/r/20210312173949.2197662-2-minchan@kernel.org Fixes: 0d8359620d9b("zram: support page writeback") Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Amos Bianchi <amosbianchi@google.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Sat, 13 Mar 2021 05:08:38 +0000 (21:08 -0800)]
zram: fix return value on writeback_store
writeback_store's return value is overwritten by submit_bio_wait's return
value. Thus, writeback_store will return zero since there was no IO
error. In the end, write syscall from userspace will see the zero as
return value, which could make the process stall to keep trying the write
until it will succeed.
Link: https://lkml.kernel.org/r/20210312173949.2197662-1-minchan@kernel.org Fixes: 3b82a051c101("drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: John Dias <joaodias@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhou Guanghui [Sat, 13 Mar 2021 05:08:33 +0000 (21:08 -0800)]
mm/memcg: set memcg when splitting page
As described in the split_page() comment, for the non-compound high order
page, the sub-pages must be freed individually. If the memcg of the first
page is valid, the tail pages cannot be uncharged when be freed.
For example, when alloc_pages_exact is used to allocate 1MB continuous
physical memory, 2MB is charged(kmemcg is enabled and __GFP_ACCOUNT is
set). When make_alloc_exact free the unused 1MB and free_pages_exact free
the applied 1MB, actually, only 4KB(one page) is uncharged.
Therefore, the memcg of the tail page needs to be set when splitting a
page.
Michel:
There are at least two explicit users of __GFP_ACCOUNT with
alloc_exact_pages added recently. See 7efe8ef274024 ("KVM: arm64:
Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT") and c419621873713
("KVM: s390: Add memcg accounting to KVM allocations"), so this is not
just a theoretical issue.
Link: https://lkml.kernel.org/r/20210304074053.65527-3-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Weilong Chen <chenweilong@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhou Guanghui [Sat, 13 Mar 2021 05:08:30 +0000 (21:08 -0800)]
mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument
Rename mem_cgroup_split_huge_fixup to split_page_memcg and explicitly pass
in page number argument.
In this way, the interface name is more common and can be used by
potential users. In addition, the complete info(memcg and flag) of the
memcg needs to be set to the tail pages.
Link: https://lkml.kernel.org/r/20210304074053.65527-2-zhouguanghui1@huawei.com Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Weilong Chen <chenweilong@huawei.com> Cc: Rui Xiang <rui.xiang@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls
In https://bugs.gentoo.org/769614 Dmitry noticed that
`ptrace(PTRACE_GET_SYSCALL_INFO)` does not work for syscalls called via
glibc's syscall() wrapper.
ia64 has two ways to call syscalls from userspace: via `break` and via
`eps` instructions.
The difference is in stack layout:
1. `eps` creates simple stack frame: no locals, in{0..7} == out{0..8}
2. `break` uses userspace stack frame: may be locals (glibc provides
one), in{0..7} == out{0..8}.
Both work fine in syscall handling cde itself.
But `ptrace(PTRACE_GET_SYSCALL_INFO)` uses unwind mechanism to
re-extract syscall arguments but it does not account for locals.
The change always skips locals registers. It should not change `eps`
path as kernel's handler already enforces locals=0 and fixes `break`.
Tested on v5.10 on rx3600 machine (ia64 9040 CPU).
Link: https://lkml.kernel.org/r/20210221002554.333076-1-slyfox@gentoo.org Link: https://bugs.gentoo.org/769614 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Reported-by: Dmitry V. Levin <ldv@altlinux.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nadav Amit [Sat, 13 Mar 2021 05:08:17 +0000 (21:08 -0800)]
mm/userfaultfd: fix memory corruption due to writeprotect
Userfaultfd self-test fails occasionally, indicating a memory corruption.
Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.
To make matters worse, memory-unprotection using userfaultfd also poses a
problem. Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set. Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.
The scenario that happens in selftests/vm/userfaultfd is as follows:
This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.
To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.
Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.
Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit <namit@vmware.com> Suggested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>