Linus Torvalds [Sat, 13 Jun 2020 17:55:29 +0000 (10:55 -0700)]
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- fix for "hex" Kconfig default to use 0x0 rather than 0 to allow these
to be removed from defconfigs
- fix from Ard Biesheuvel for EFI HYP mode booting
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8985/1: efi/decompressor: deal with HYP mode boot gracefully
ARM: 8984/1: Kconfig: set default ZBOOT_ROM_TEXT/BSS value to 0x0
Linus Torvalds [Sat, 13 Jun 2020 17:51:29 +0000 (10:51 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
"A few changes for alpha. They're mostly small janitorial fixes but
there's also a build fix and most notably a patch from Mikulas that
fixes a hang on boot on the Avanti platform, which required quite a
bit of work and review"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: Fix build around srm_sysrq_reboot_op
alpha: c_next should increase position index
alpha: Replace sg++ with sg = sg_next(sg)
alpha: fix memory barriers so that they conform to the specification
alpha: remove unneeded semicolon in sys_eiger.c
alpha: remove unneeded semicolon in osf_sys.c
alpha: Replace strncmp with str_has_prefix
alpha: fix rtc port ranges
alpha: Kconfig: pedantic formatting
Linus Torvalds [Sat, 13 Jun 2020 17:21:00 +0000 (10:21 -0700)]
Merge tag 'ras-core-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Thomas Gleixner:
"RAS updates from Borislav Petkov:
- Unmap a whole guest page if an MCE is encountered in it to avoid
follow-on MCEs leading to the guest crashing, by Tony Luck.
This change collided with the entry changes and the merge
resolution would have been rather unpleasant. To avoid that the
entry branch was merged in before applying this. The resulting code
did not change over the rebase.
- AMD MCE error thresholding machinery cleanup and hotplug
sanitization, by Thomas Gleixner.
- Change the MCE notifiers to denote whether they have handled the
error and not break the chain early by returning NOTIFY_STOP, thus
giving the opportunity for the later handlers in the chain to see
it. By Tony Luck.
- Add AMD family 0x17, models 0x60-6f support, by Alexander Monakov.
- Last but not least, the usual round of fixes and improvements"
* tag 'ras-core-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/mce/dev-mcelog: Fix -Wstringop-truncation warning about strncpy()
x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned
EDAC/amd64: Add AMD family 17h model 60h PCI IDs
hwmon: (k10temp) Add AMD family 17h model 60h PCI match
x86/amd_nb: Add AMD family 17h model 60h PCI IDs
x86/mcelog: Add compat_ioctl for 32-bit mcelog support
x86/mce: Drop bogus comment about mce.kflags
x86/mce: Fixup exception only for the correct MCEs
EDAC: Drop the EDAC report status checks
x86/mce: Add mce=print_all option
x86/mce: Change default MCE logger to check mce->kflags
x86/mce: Fix all mce notifiers to update the mce->kflags bitmask
x86/mce: Add a struct mce.kflags field
x86/mce: Convert the CEC to use the MCE notifier
x86/mce: Rename "first" function as "early"
x86/mce/amd, edac: Remove report_gart_errors
x86/mce/amd: Make threshold bank setting hotplug robust
x86/mce/amd: Cleanup threshold device remove path
x86/mce/amd: Straighten CPU hotplug path
x86/mce/amd: Sanitize thresholding device creation hotplug path
...
Linus Torvalds [Sat, 13 Jun 2020 17:05:47 +0000 (10:05 -0700)]
Merge tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Thomas Gleixner:
"The x86 entry, exception and interrupt code rework
This all started about 6 month ago with the attempt to move the Posix
CPU timer heavy lifting out of the timer interrupt code and just have
lockless quick checks in that code path. Trivial 5 patches.
This unearthed an inconsistency in the KVM handling of task work and
the review requested to move all of this into generic code so other
architectures can share.
Valid request and solved with another 25 patches but those unearthed
inconsistencies vs. RCU and instrumentation.
Digging into this made it obvious that there are quite some
inconsistencies vs. instrumentation in general. The int3 text poke
handling in particular was completely unprotected and with the batched
update of trace events even more likely to expose to endless int3
recursion.
In parallel the RCU implications of instrumenting fragile entry code
came up in several discussions.
The conclusion of the x86 maintainer team was to go all the way and
make the protection against any form of instrumentation of fragile and
dangerous code pathes enforcable and verifiable by tooling.
A first batch of preparatory work hit mainline with commit 60df1c925ded ("Pull x86 entry code updates from Thomas Gleixner")
That (almost) full solution introduced a new code section
'.noinstr.text' into which all code which needs to be protected from
instrumentation of all sorts goes into. Any call into instrumentable
code out of this section has to be annotated. objtool has support to
validate this.
Kprobes now excludes this section fully which also prevents BPF from
fiddling with it and all 'noinstr' annotated functions also keep
ftrace off. The section, kprobes and objtool changes are already
merged.
The major changes coming with this are:
- Preparatory cleanups
- Annotating of relevant functions to move them into the
noinstr.text section or enforcing inlining by marking them
__always_inline so the compiler cannot misplace or instrument
them.
- Splitting and simplifying the idtentry macro maze so that it is
now clearly separated into simple exception entries and the more
interesting ones which use interrupt stacks and have the paranoid
handling vs. CR3 and GS.
- Move quite some of the low level ASM functionality into C code:
- enter_from and exit to user space handling. The ASM code now
calls into C after doing the really necessary ASM handling and
the return path goes back out without bells and whistels in
ASM.
- exception entry/exit got the equivivalent treatment
- move all IRQ tracepoints from ASM to C so they can be placed as
appropriate which is especially important for the int3
recursion issue.
- Consolidate the declaration and definition of entry points between
32 and 64 bit. They share a common header and macros now.
- Remove the extra device interrupt entry maze and just use the
regular exception entry code.
- All ASM entry points except NMI are now generated from the shared
header file and the corresponding macros in the 32 and 64 bit
entry ASM.
- The C code entry points are consolidated as well with the help of
DEFINE_IDTENTRY*() macros. This allows to ensure at one central
point that all corresponding entry points share the same
semantics. The actual function body for most entry points is in an
instrumentable and sane state.
There are special macros for the more sensitive entry points, e.g.
INT3 and of course the nasty paranoid #NMI, #MCE, #DB and #DF.
They allow to put the whole entry instrumentation and RCU handling
into safe places instead of the previous pray that it is correct
approach.
- The INT3 text poke handling is now completely isolated and the
recursion issue banned. Aside of the entry rework this required
other isolation work, e.g. the ability to force inline bsearch.
- Prevent #DB on fragile entry code, entry relevant memory and
disable it on NMI, #MC entry, which allowed to get rid of the
nested #DB IST stack shifting hackery.
- A few other cleanups and enhancements which have been made
possible through this and already merged changes, e.g.
consolidating and further restricting the IDT code so the IDT
table becomes RO after init which removes yet another popular
attack vector
- About 680 lines of ASM maze are gone.
There are a few open issues:
- An escape out of the noinstr section in the MCE handler which needs
some more thought but under the aspect that MCE is a complete
trainwreck by design and the propability to survive it is low, this
was not high on the priority list.
- Paravirtualization
When PV is enabled then objtool complains about a bunch of indirect
calls out of the noinstr section. There are a few straight forward
ways to fix this, but the other issues vs. general correctness were
more pressing than parawitz.
- KVM
KVM is inconsistent as well. Patches have been posted, but they
have not yet been commented on or picked up by the KVM folks.
- IDLE
Pretty much the same problems can be found in the low level idle
code especially the parts where RCU stopped watching. This was
beyond the scope of the more obvious and exposable problems and is
on the todo list.
The lesson learned from this brain melting exercise to morph the
evolved code base into something which can be validated and understood
is that once again the violation of the most important engineering
principle "correctness first" has caused quite a few people to spend
valuable time on problems which could have been avoided in the first
place. The "features first" tinkering mindset really has to stop.
With that I want to say thanks to everyone involved in contributing to
this effort. Special thanks go to the following people (alphabetical
order): Alexandre Chartre, Andy Lutomirski, Borislav Petkov, Brian
Gerst, Frederic Weisbecker, Josh Poimboeuf, Juergen Gross, Lai
Jiangshan, Macro Elver, Paolo Bonzin,i Paul McKenney, Peter Zijlstra,
Vitaly Kuznetsov, and Will Deacon"
* tag 'x86-entry-2020-06-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (142 commits)
x86/entry: Force rcu_irq_enter() when in idle task
x86/entry: Make NMI use IDTENTRY_RAW
x86/entry: Treat BUG/WARN as NMI-like entries
x86/entry: Unbreak __irqentry_text_start/end magic
x86/entry: __always_inline CR2 for noinstr
lockdep: __always_inline more for noinstr
x86/entry: Re-order #DB handler to avoid *SAN instrumentation
x86/entry: __always_inline arch_atomic_* for noinstr
x86/entry: __always_inline irqflags for noinstr
x86/entry: __always_inline debugreg for noinstr
x86/idt: Consolidate idt functionality
x86/idt: Cleanup trap_init()
x86/idt: Use proper constants for table size
x86/idt: Add comments about early #PF handling
x86/idt: Mark init only functions __init
x86/entry: Rename trace_hardirqs_off_prepare()
x86/entry: Clarify irq_{enter,exit}_rcu()
x86/entry: Remove DBn stacks
x86/entry: Remove debug IDT frobbing
x86/entry: Optimize local_db_save() for virt
...
Linus Torvalds [Sat, 13 Jun 2020 16:56:21 +0000 (09:56 -0700)]
Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull notification queue from David Howells:
"This adds a general notification queue concept and adds an event
source for keys/keyrings, such as linking and unlinking keys and
changing their attributes.
Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:
Without this, g-o-a has to constantly poll a keyring-based kerberos
cache to find out if kinit has changed anything.
[ There are other notification pending: mount/sb fsinfo notifications
for libmount that Karel Zak and Ian Kent have been working on, and
Christian Brauner would like to use them in lxc, but let's see how
this one works first ]
LSM hooks are included:
- A set of hooks are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different
"watched object" parameter, so they're not really shareable. The
LSM should use current's credentials. [Wanted by SELinux & Smack]
- A hook is provided to allow an LSM to rule on whether or not a
particular message may be posted to a particular queue. This is
given the credentials from the event generator (which may be the
system) and the watch setter. [Wanted by Smack]
I've provided SELinux and Smack with implementations of some of these
hooks.
WHY
===
Key/keyring notifications are desirable because if you have your
kerberos tickets in a file/directory, your Gnome desktop will monitor
that using something like fanotify and tell you if your credentials
cache changes.
However, we also have the ability to cache your kerberos tickets in
the session, user or persistent keyring so that it isn't left around
on disk across a reboot or logout. Keyrings, however, cannot currently
be monitored asynchronously, so the desktop has to poll for it - not
so good on a laptop. This facility will allow the desktop to avoid the
need to poll.
DESIGN DECISIONS
================
- The notification queue is built on top of a standard pipe. Messages
are effectively spliced in. The pipe is opened with a special flag:
pipe2(fds, O_NOTIFICATION_PIPE);
The special flag has the same value as O_EXCL (which doesn't seem
like it will ever be applicable in this context)[?]. It is given up
front to make it a lot easier to prohibit splice&co from accessing
the pipe.
[?] Should this be done some other way? I'd rather not use up a new
O_* flag if I can avoid it - should I add a pipe3() system call
instead?
Messages are then read out of the pipe using read().
- It should be possible to allow write() to insert data into the
notification pipes too, but this is currently disabled as the
kernel has to be able to insert messages into the pipe *without*
holding pipe->mutex and the code to make this work needs careful
auditing.
- sendfile(), splice() and vmsplice() are disabled on notification
pipes because of the pipe->mutex issue and also because they
sometimes want to revert what they just did - but one or more
notification messages might've been interleaved in the ring.
- The kernel inserts messages with the wait queue spinlock held. This
means that pipe_read() and pipe_write() have to take the spinlock
to update the queue pointers.
- Records in the buffer are binary, typed and have a length so that
they can be of varying size.
This allows multiple heterogeneous sources to share a common
buffer; there are 16 million types available, of which I've used
just a few, so there is scope for others to be used. Tags may be
specified when a watchpoint is created to help distinguish the
sources.
- Records are filterable as types have up to 256 subtypes that can be
individually filtered. Other filtration is also available.
- Notification pipes don't interfere with each other; each may be
bound to a different set of watches. Any particular notification
will be copied to all the queues that are currently watching for it
- and only those that are watching for it.
- When recording a notification, the kernel will not sleep, but will
rather mark a queue as having lost a message if there's
insufficient space. read() will fabricate a loss notification
message at an appropriate point later.
- The notification pipe is created and then watchpoints are attached
to it, using one of:
where in both cases, fd indicates the queue and the number after is
a tag between 0 and 255.
- Watches are removed if either the notification pipe is destroyed or
the watched object is destroyed. In the latter case, a message will
be generated indicating the enforced watch removal.
Things I want to avoid:
- Introducing features that make the core VFS dependent on the
network stack or networking namespaces (ie. usage of netlink).
- Dumping all this stuff into dmesg and having a daemon that sits
there parsing the output and distributing it as this then puts the
responsibility for security into userspace and makes handling
namespaces tricky. Further, dmesg might not exist or might be
inaccessible inside a container.
- Letting users see events they shouldn't be able to see.
TESTING AND MANPAGES
====================
- The keyutils tree has a pipe-watch branch that has keyctl commands
for making use of notifications. Proposed manual pages can also be
found on this branch, though a couple of them really need to go to
the main manpages repository instead.
If the kernel supports the watching of keys, then running "make
test" on that branch will cause the testing infrastructure to spawn
a monitoring process on the side that monitors a notifications pipe
for all the key/keyring changes induced by the tests and they'll
all be checked off to make sure they happened.
- A test program is provided (samples/watch_queue/watch_test) that
can be used to monitor for keyrings, mount and superblock events.
Information on the notifications is simply logged to stdout"
* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
smack: Implement the watch_key and post_notification hooks
selinux: Implement the watch_key security hook
keys: Make the KEY_NEED_* perms an enum rather than a mask
pipe: Add notification lossage handling
pipe: Allow buffers to be marked read-whole-or-error for notifications
Add sample notification program
watch_queue: Add a key/keyring notification facility
security: Add hooks to rule on setting a watch
pipe: Add general notification queue support
pipe: Add O_NOTIFICATION_PIPE
security: Add a hook for the point of notification insertion
uapi: General notification queue definitions
Ard Biesheuvel [Fri, 12 Jun 2020 10:21:35 +0000 (11:21 +0100)]
ARM: 8985/1: efi/decompressor: deal with HYP mode boot gracefully
EFI on ARM only supports short descriptors, and given that it mandates
that the MMU and caches are on, it is implied that booting in HYP mode
is not supported.
However, implementations of EFI exist (i.e., U-Boot) that ignore this
requirement, which is not entirely unreasonable, given that it makes
HYP mode inaccessible to the operating system.
So let's make sure that we can deal with this condition gracefully.
We already tolerate booting the EFI stub with the caches off (even
though this violates the EFI spec as well), and so we should deal
with HYP mode boot with MMU and caches either on or off.
- When the MMU and caches are on, we can ignore the HYP stub altogether,
since we can carry on executing at HYP. We do need to ensure that we
disable the MMU at HYP before entering the kernel proper.
- When the MMU and caches are off, we have to drop to SVC mode so that
we can set up the page tables using short descriptors. In this case,
we need to install the HYP stub as usual, so that we can return to HYP
mode before handing over to the kernel proper.
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Chris Packham [Tue, 9 Jun 2020 02:28:14 +0000 (03:28 +0100)]
ARM: 8984/1: Kconfig: set default ZBOOT_ROM_TEXT/BSS value to 0x0
ZBOOT_ROM_TEXT and ZBOOT_ROM_BSS are defined as 'hex' but had a default
of "0". Kconfig will helpfully expand a text entry of 0 to 0x0 but
because this is not the same as the default value it was treated as
being explicitly set when running 'make savedefconfig' so most arm
defconfigs have CONFIG_ZBOOT_ROM_TEXT=0x0 and CONFIG_ZBOOT_ROM_BSS=0x0.
Change the default to 0x0 which will mean next time the defconfigs are
re-generated the spurious config entries will be removed.
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Joerg Roedel [Thu, 11 Jun 2020 09:11:39 +0000 (11:11 +0200)]
alpha: Fix build around srm_sysrq_reboot_op
The patch introducing the struct was probably never compile tested,
because it sets a handler with a wrong function signature. Wrap the
handler into a functions with the correct signature to fix the build.
Fixes: 866e565ae70f ("tty/sysrq: alpha: export and use __sysrq_get_key_op()") Cc: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Matt Turner <mattst88@gmail.com>
Mikulas Patocka [Tue, 26 May 2020 14:47:49 +0000 (10:47 -0400)]
alpha: fix memory barriers so that they conform to the specification
The commits 1f0f0fc705fb and ac77c024a435 broke boot on the Alpha Avanti
platform. The patches move memory barriers after a write before the write.
The result is that if there's iowrite followed by ioread, there is no
barrier between them.
The Alpha architecture allows reordering of the accesses to the I/O space,
and the missing barrier between write and read causes hang with serial
port and real time clock.
This patch makes barriers confiorm to the specification.
1. We add mb() before readX_relaxed and writeX_relaxed -
memory-barriers.txt claims that these functions must be ordered w.r.t.
each other. Alpha doesn't order them, so we need an explicit barrier.
2. We add mb() before reads from the I/O space - so that if there's a
write followed by a read, there should be a barrier between them.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 1f0f0fc705fb ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: ac77c024a435 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Reviewed-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Matt Turner <mattst88@gmail.com>
In commit ce6f8367934d
("tracing: Use str_has_prefix() instead of using fixed sizes")
the newly introduced str_has_prefix() was used
to replace error-prone strncmp(str, const, len).
Here fix codes with the same pattern.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Matt Turner <mattst88@gmail.com>
Linus Torvalds [Fri, 12 Jun 2020 19:38:18 +0000 (12:38 -0700)]
Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull proc fix from Eric Biederman:
"Much to my surprise syzbot found a very old bug in proc that the
recent changes made easier to reproce. This bug is subtle enough it
looks like it fooled everyone who should know better"
* 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
proc: Use new_inode not new_inode_pseudo
Thomas Gleixner [Fri, 12 Jun 2020 13:55:00 +0000 (15:55 +0200)]
x86/entry: Force rcu_irq_enter() when in idle task
The idea of conditionally calling into rcu_irq_enter() only when RCU is
not watching turned out to be not completely thought through.
Paul noticed occasional premature end of grace periods in RCU torture
testing. Bisection led to the commit which made the invocation of
rcu_irq_enter() conditional on !rcu_is_watching().
It turned out that this conditional breaks RCU assumptions about the idle
task when the scheduler tick happens to be a nested interrupt. Nested
interrupts can happen when the first interrupt invokes softirq processing
on return which enables interrupts.
If that nested tick interrupt does not invoke rcu_irq_enter() then the
RCU's irq-nesting checks will believe that this interrupt came directly
from idle, which will cause RCU to report a quiescent state. Because this
interrupt instead came from a softirq handler which might have been
executing an RCU read-side critical section, this can cause the grace
period to end prematurely.
Change the condition from !rcu_is_watching() to is_idle_task(current) which
enforces that interrupts in the idle task unconditionally invoke
rcu_irq_enter() independent of the RCU state.
This is also correct vs. user mode entries in NOHZ full scenarios because
user mode entries bring RCU out of EQS and force the RCU irq nesting state
accounting to nested. As only the first interrupt can enter from user mode
a nested tick interrupt will enter from kernel mode and as the nesting
state accounting is forced to nesting it will not do anything stupid even
if rcu_irq_enter() has not been invoked.
Fixes: 42088813a772 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()") Reported-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: "Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/87wo4cxubv.fsf@nanos.tec.linutronix.de
Linus Torvalds [Fri, 12 Jun 2020 19:24:42 +0000 (12:24 -0700)]
Merge tag 'pwm/for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"Nothing too exciting for this cycle. A couple of fixes across the
board, and Lee volunteered to help with patch review"
* tag 'pwm/for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: Add missing "CONFIG_" prefix
MAINTAINERS: Add Lee Jones as reviewer for the PWM subsystem
pwm: imx27: Fix rounding behavior
pwm: rockchip: Simplify rockchip_pwm_get_state()
pwm: img: Call pm_runtime_put() in pm_runtime_get_sync() failed case
pwm: tegra: Support dynamic clock frequency configuration
pwm: jz4740: Add support for the JZ4725B
pwm: jz4740: Make PWM start with the active part
pwm: jz4740: Enhance precision in calculation of duty cycle
pwm: jz4740: Drop dependency on MACH_INGENIC
pwm: lpss: Fix get_state runtime-pm reference handling
pwm: sun4i: Support direct clock output on Allwinner A64
pwm: Add support for Azoteq IQS620A PWM generator
dt-bindings: pwm: rcar: add r8a77961 support
pwm: Add missing '\n' in log messages
Linus Torvalds [Fri, 12 Jun 2020 19:19:13 +0000 (12:19 -0700)]
Merge tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu driver directory structure cleanup from Joerg Roedel:
"Move the Intel and AMD IOMMU drivers into their own subdirectory.
Both drivers consist of several files by now and giving them their own
directory unclutters the IOMMU top-level directory a bit"
* tag 'iommu-drivers-move-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Move Intel IOMMU driver into subdirectory
iommu/amd: Move AMD IOMMU driver into subdirectory
Linus Torvalds [Fri, 12 Jun 2020 19:13:36 +0000 (12:13 -0700)]
Merge tag 'printk-for-5.8-kdb-nmi' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk fix from Petr Mladek:
"One more printk change for 5.8: make sure that messages printed from
KDB context are redirected to KDB console handlers. It did not work
when KDB interrupted NMI or printk_safe contexts.
Arm people started hitting this problem more often recently. I forgot
to add the fix into the previous pull request by mistake"
* tag 'printk-for-5.8-kdb-nmi' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk/kdb: Redirect printk messages into kdb in any context
Recently syzbot reported that unmounting proc when there is an ongoing
inotify watch on the root directory of proc could result in a use
after free when the watch is removed after the unmount of proc
when the watcher exits.
Commit 5cf49498f396 ("proc: Remove the now unnecessary internal mount
of proc") made it easier to unmount proc and allowed syzbot to see the
problem, but looking at the code it has been around for a long time.
Looking at the code the fsnotify watch should have been removed by
fsnotify_sb_delete in generic_shutdown_super. Unfortunately the inode
was allocated with new_inode_pseudo instead of new_inode so the inode
was not on the sb->s_inodes list. Which prevented
fsnotify_unmount_inodes from finding the inode and removing the watch
as well as made it so the "VFS: Busy inodes after unmount" warning
could not find the inodes to warn about them.
Make all of the inodes in proc visible to generic_shutdown_super,
and fsnotify_sb_delete by using new_inode instead of new_inode_pseudo.
The only functional difference is that new_inode places the inodes
on the sb->s_inodes list.
I wrote a small test program and I can verify that without changes it
can trigger this issue, and by replacing new_inode_pseudo with
new_inode the issues goes away.
Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/000000000000d788c905a7dfa3f4@google.com Reported-by: syzbot+7d2debdcdb3cb93c1e5e@syzkaller.appspotmail.com Fixes: 7858816a93e3 ("proc: Implement /proc/thread-self to point at the directory of the current thread") Fixes: 0a108cd2d321 ("procfs: switch /proc/self away from proc_dir_entry") Fixes: 1e51fb6012a1 ("vfs,proc: guarantee unique inodes in /proc") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Linus Torvalds [Fri, 12 Jun 2020 18:56:43 +0000 (11:56 -0700)]
Merge tag 'devicetree-fixes-for-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fixes from Rob Herring:
- Another round of whack-a-mole removing 'allOf', redundant cases of
'maxItems' and incorrect 'reg' sizes
- Fix support for yaml.h in non-standard paths
* tag 'devicetree-fixes-for-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: Remove redundant 'maxItems'
dt-bindings: Fix more incorrect 'reg' property sizes in examples
dt-bindings: phy: qcom: Fix missing 'ranges' and example addresses
dt-bindings: Remove more cases of 'allOf' containing a '$ref'
scripts/dtc: use pkg-config to include <yaml.h> in non-standard path
Linus Torvalds [Fri, 12 Jun 2020 18:05:52 +0000 (11:05 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
"The guest side of the asynchronous page fault work has been delayed to
5.9 in order to sync with Thomas's interrupt entry rework, but here's
the rest of the KVM updates for this merge window.
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (62 commits)
KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
KVM: selftests: fix sync_with_host() in smm_test
KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected
KVM: async_pf: Cleanup kvm_setup_async_pf()
kvm: i8254: remove redundant assignment to pointer s
KVM: x86: respect singlestep when emulating instruction
KVM: selftests: Don't probe KVM_CAP_HYPERV_ENLIGHTENED_VMCS when nested VMX is unsupported
KVM: selftests: do not substitute SVM/VMX check with KVM_CAP_NESTED_STATE check
KVM: nVMX: Consult only the "basic" exit reason when routing nested exit
KVM: arm64: Move hyp_symbol_addr() to kvm_asm.h
KVM: arm64: Synchronize sysreg state on injecting an AArch32 exception
KVM: arm64: Make vcpu_cp1x() work on Big Endian hosts
KVM: arm64: Remove host_cpu_context member from vcpu structure
KVM: arm64: Stop sparse from moaning at __hyp_this_cpu_ptr
KVM: arm64: Handle PtrAuth traps early
KVM: x86: Unexport x86_fpu_cache and make it static
KVM: selftests: Ignore KVM 5-level paging support for VM_MODE_PXXV48_4K
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context
KVM: arm64: Stop save/restoring ACTLR_EL1
KVM: arm64: Add emulation for 32bit guests accessing ACTLR2
...
Linus Torvalds [Fri, 12 Jun 2020 18:00:45 +0000 (11:00 -0700)]
Merge tag 'for-linus-5.8b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- several smaller cleanups
- a fix for a Xen guest regression with CPU offlining
- a small fix in the xen pvcalls backend driver
- an update of MAINTAINERS
* tag 'for-linus-5.8b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
xen/pci: Get rid of verbose_request and use dev_dbg() instead
xenbus: Use dev_printk() when possible
xen-pciback: Use dev_printk() when possible
xen: enable BALLOON_MEMORY_HOTPLUG by default
xen: expand BALLOON_MEMORY_HOTPLUG description
xen/pvcalls: Make pvcalls_back_global static
xen/cpuhotplug: Fix initial CPU offlining for PV(H) guests
xen-platform: Constify dev_pm_ops
xen/pvcalls-back: test for errors when calling backend_connect()
Thomas Gleixner [Fri, 12 Jun 2020 12:02:27 +0000 (14:02 +0200)]
x86/entry: Make NMI use IDTENTRY_RAW
For no reason other than beginning brainmelt, IDTENTRY_NMI was mapped to
IDTENTRY_IST.
This is not a problem on 64bit because the IST default entry point maps to
IDTENTRY_RAW which does not any entry handling. The surplus function
declaration for the noist C entry point is unused and as there is no ASM
code emitted for NMI this went unnoticed.
On 32bit IDTENTRY_IST maps to a regular IDTENTRY which does the normal
entry handling. That is clearly the wrong thing to do for NMI.
Map it to IDTENTRY_RAW to unbreak it. The IDTENTRY_NMI mapping needs to
stay to avoid emitting ASM code.
Andy Lutomirski [Fri, 12 Jun 2020 03:26:38 +0000 (20:26 -0700)]
x86/entry: Treat BUG/WARN as NMI-like entries
BUG/WARN are cleverly optimized using UD2 to handle the BUG/WARN out of
line in an exception fixup.
But if BUG or WARN is issued in a funny RCU context, then the
idtentry_enter...() path might helpfully WARN that the RCU context is
invalid, which results in infinite recursion.
Split the BUG/WARN handling into an nmi_enter()/nmi_exit() path in
exc_invalid_op() to increase the chance to survive the experience.
[ tglx: Make the declaration match the implementation ]
Before commit 622f610569f8 ("powerpc/kvm/book3s: Use kvm helpers
to walk shadow or secondary table") we called __find_linux_pte() with
a page table pointer from a kvm_nested_guest struct but
now we rely on kvmhv_find_nested() which takes an L1 LPID and returns
a kvm_nested_guest pointer, however we pass a L0 LPID there and
the L2 guest hangs.
This fixes the LPID passed to kvmppc_hv_handle_set_rc().
Fixes: 622f610569f8 ("powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611030559.75257-1-aik@ozlabs.ru
Ley Foon Tan [Fri, 12 Jun 2020 06:04:49 +0000 (14:04 +0800)]
nios2: signal: Mark expected switch fall-through
Mark switch cases where we are expecting to fall through.
Fix the following warning through the use of the new the new
pseudo-keyword fallthrough;
arch/nios2/kernel/signal.c:254:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
254 | restart = -2;
| ~~~~~~~~^~~~
arch/nios2/kernel/signal.c:255:3: note: here
255 | case ERESTARTNOHAND:
| ^~~~
Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Ley Foon Tan <ley.foon.tan@intel.com>
Linus Torvalds [Fri, 12 Jun 2020 01:55:43 +0000 (18:55 -0700)]
Merge tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull the Kernel Concurrency Sanitizer from Thomas Gleixner:
"The Kernel Concurrency Sanitizer (KCSAN) is a dynamic race detector,
which relies on compile-time instrumentation, and uses a
watchpoint-based sampling approach to detect races.
The feature was under development for quite some time and has already
found legitimate bugs.
Unfortunately it comes with a limitation, which was only understood
late in the development cycle:
It requires an up to date CLANG-11 compiler
CLANG-11 is not yet released (scheduled for June), but it's the only
compiler today which handles the kernel requirements and especially
the annotations of functions to exclude them from KCSAN
instrumentation correctly.
These annotations really need to work so that low level entry code and
especially int3 text poke handling can be completely isolated.
A detailed discussion of the requirements and compiler issues can be
found here:
We came to the conclusion that trying to work around compiler
limitations and bugs again would end up in a major trainwreck, so
requiring a working compiler seemed to be the best choice.
For Continous Integration purposes the compiler restriction is
manageable and that's where most xxSAN reports come from.
For a change this limitation might make GCC people actually look at
their bugs. Some issues with CSAN in GCC are 7 years old and one has
been 'fixed' 3 years ago with a half baken solution which 'solved' the
reported issue but not the underlying problem.
The KCSAN developers also ponder to use a GCC plugin to become
independent, but that's not something which will show up in a few
days.
Blocking KCSAN until wide spread compiler support is available is not
a really good alternative because the continuous growth of lockless
optimizations in the kernel demands proper tooling support"
* tag 'locking-kcsan-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining
compiler.h: Move function attributes to compiler_types.h
compiler.h: Avoid nested statement expression in data_race()
compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
kcsan: Update Documentation to change supported compilers
kcsan: Remove 'noinline' from __no_kcsan_or_inline
kcsan: Pass option tsan-instrument-read-before-write to Clang
kcsan: Support distinguishing volatile accesses
kcsan: Restrict supported compilers
kcsan: Avoid inserting __tsan_func_entry/exit if possible
ubsan, kcsan: Don't combine sanitizer with kcov on clang
objtool, kcsan: Add kcsan_disable_current() and kcsan_enable_current_nowarn()
kcsan: Add __kcsan_{enable,disable}_current() variants
checkpatch: Warn about data_race() without comment
kcsan: Use GFP_ATOMIC under spin lock
Improve KCSAN documentation a bit
kcsan: Make reporting aware of KCSAN tests
kcsan: Fix function matching in report
kcsan: Change data_race() to no longer require marking racing accesses
kcsan: Move kcsan_{disable,enable}_current() to kcsan-checks.h
...
Linus Torvalds [Fri, 12 Jun 2020 01:27:19 +0000 (18:27 -0700)]
Merge tag 'locking-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull atomics rework from Thomas Gleixner:
"Peter Zijlstras rework of atomics and fallbacks. This solves two
problems:
1) Compilers uninline small atomic_* static inline functions which
can expose them to instrumentation.
2) The instrumentation of atomic primitives was done at the
architecture level while composites or fallbacks were provided at
the generic level. As a result there are no uninstrumented
variants of the fallbacks.
Both issues were in the way of fully isolating fragile entry code
pathes and especially the text poke int3 handler which is prone to an
endless recursion problem when anything in that code path is about to
be instrumented. This was always a problem, but got elevated due to
the new batch mode updates of tracing.
The solution is to mark the functions __always_inline and to flip the
fallback and instrumentation so the non-instrumented variants are at
the architecture level and the instrumentation is done in generic
code.
The latter introduces another fallback variant which will go away once
all architectures have been moved over to arch_atomic_*"
* tag 'locking-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/atomics: Flip fallbacks and instrumentation
asm-generic/atomic: Use __always_inline for fallback wrappers
Linus Torvalds [Fri, 12 Jun 2020 01:18:50 +0000 (18:18 -0700)]
Merge branch 'akpm' (patches from Andrew)
Pull updates from Andrew Morton:
"A few fixes and stragglers.
Subsystems affected by this patch series: mm/memory-failure, ocfs2,
lib/lzo, misc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
amdgpu: a NULL ->mm does not mean a thread is a kthread
lib/lzo: fix ambiguous encoding bug in lzo-rle
ocfs2: fix build failure when TCP/IP is disabled
mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread
mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
Dave Rodgman [Fri, 12 Jun 2020 00:34:54 +0000 (17:34 -0700)]
lib/lzo: fix ambiguous encoding bug in lzo-rle
In some rare cases, for input data over 32 KB, lzo-rle could encode two
different inputs to the same compressed representation, so that
decompression is then ambiguous (i.e. data may be corrupted - although
zram is not affected because it operates over 4 KB pages).
This modifies the compressor without changing the decompressor or the
bitstream format, such that:
- there is no change to how data produced by the old compressor is
decompressed
- an old decompressor will correctly decode data from the updated
compressor
- performance and compression ratio are not affected
- we avoid introducing a new bitstream format
In testing over 12.8M real-world files totalling 903 GB, three files
were affected by this bug. I also constructed 37M semi-random 64 KB
files totalling 2.27 TB, and saw no affected files. Finally I tested
over files constructed to contain each of the ~1024 possible bad input
sequences; for all of these cases, updated lzo-rle worked correctly.
There is no significant impact to performance or compression ratio.
Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Dave Rodgman <dave.rodgman@arm.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Chao Yu <yuchao0@huawei.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200507100203.29785-1-dave.rodgman@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tom Seewald [Fri, 12 Jun 2020 00:34:51 +0000 (17:34 -0700)]
ocfs2: fix build failure when TCP/IP is disabled
After commit 87c3f6de058e ("tcp: add tcp_sock_set_nodelay") and commit 313c15c1714c ("tcp: add tcp_sock_set_user_timeout"), building the kernel
with OCFS2_FS=y but without INET=y causes it to fail with:
ld: fs/ocfs2/cluster/tcp.o: in function `o2net_accept_many':
tcp.c:(.text+0x21b1): undefined reference to `tcp_sock_set_nodelay'
ld: tcp.c:(.text+0x21c1): undefined reference to `tcp_sock_set_user_timeout'
ld: fs/ocfs2/cluster/tcp.o: in function `o2net_start_connect':
tcp.c:(.text+0x2633): undefined reference to `tcp_sock_set_nodelay'
ld: tcp.c:(.text+0x2643): undefined reference to `tcp_sock_set_user_timeout'
This is due to tcp_sock_set_nodelay() and tcp_sock_set_user_timeout()
being declared in linux/tcp.h and defined in net/ipv4/tcp.c, which
depend on TCP/IP being enabled.
To fix this, make OCFS2_FS depend on INET=y which already requires
NET=y.
Fixes: 87c3f6de058e ("tcp: add tcp_sock_set_nodelay") Fixes: 313c15c1714c ("tcp: add tcp_sock_set_user_timeout") Signed-off-by: Tom Seewald <tseewald@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Link: http://lkml.kernel.org/r/20200606190827.23954-1-tseewald@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Fri, 12 Jun 2020 00:34:48 +0000 (17:34 -0700)]
mm/memory-failure: send SIGBUS(BUS_MCEERR_AR) only to current thread
Action Required memory error should happen only when a processor is
about to access to a corrupted memory, so it's synchronous and only
affects current process/thread.
Recently commit 0265d96adb1e ("mm, memory_failure: don't send
BUS_MCEERR_AO for action required error") fixed the issue that Action
Required memory could unnecessarily send SIGBUS to the processes which
share the error memory. But we still have another issue that we could
send SIGBUS to a wrong thread.
This is because collect_procs() and task_early_kill() fails to add the
current process to "to-kill" list. So this patch is suggesting to fix
it. With this fix, SIGBUS(BUS_MCEERR_AR) is never sent to non-current
process/thread.
Naoya Horiguchi [Fri, 12 Jun 2020 00:34:45 +0000 (17:34 -0700)]
mm/memory-failure: prioritize prctl(PR_MCE_KILL) over vm.memory_failure_early_kill
Patch series "hwpoison: fixes signaling on memory error"
This is a small patchset to solve issues in memory error handler to send
SIGBUS to proper process/thread as expected in configuration. Please
see descriptions in individual patches for more details.
This patch (of 2):
Early-kill policy is controlled from two types of settings, one is
per-process setting prctl(PR_MCE_KILL) and the other is system-wide
setting vm.memory_failure_early_kill. Users expect per-process setting
to override system-wide setting as many other settings do, but
early-kill setting doesn't work as such.
For example, if a system configures vm.memory_failure_early_kill to 1
(enabled), a process receives SIGBUS even if it's configured to
explicitly disable PF_MCE_KILL by prctl(). That's not desirable for
applications with their own policies.
This patch is suggesting to change the priority of these two types of
settings, by checking sysctl_memory_failure_early_kill only when a given
process has the default kill policy.
Note that this patch is solving a thread choice issue too.
Originally, collect_procs() always chooses the main thread when
vm.memory_failure_early_kill is 1, even if the process has a dedicated
thread for memory error handling. SIGBUS should be sent to the
dedicated thread if early-kill is enabled via
vm.memory_failure_early_kill as we are doing for PR_MCE_KILL_EARLY
processes.
Linus Torvalds [Thu, 11 Jun 2020 23:10:08 +0000 (16:10 -0700)]
Merge tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few late stragglers in here. In particular:
- Validate full range for provided buffers (Bijan)
- Fix bad use of kfree() in buffer registration failure (Denis)
- Don't allow close of ring itself, it's not fully safe. Making it
fully safe would require making the system call more expensive,
which isn't worth it.
- Buffer selection fix
- Regression fix for O_NONBLOCK retry
- Make IORING_OP_ACCEPT honor O_NONBLOCK (Jiufei)
- Restrict opcode handling for SQ/IOPOLL (Pavel)
- io-wq work handling cleanups and improvements (Pavel, Xiaoguang)
- IOPOLL race fix (Xiaoguang)"
* tag 'io_uring-5.8-2020-06-11' of git://git.kernel.dk/linux-block:
io_uring: fix io_kiocb.flags modification race in IOPOLL mode
io_uring: check file O_NONBLOCK state for accept
io_uring: avoid unnecessary io_wq_work copy for fast poll feature
io_uring: avoid whole io_wq_work copy for requests completed inline
io_uring: allow O_NONBLOCK async retry
io_wq: add per-wq work handler instead of per work
io_uring: don't arm a timeout through work.func
io_uring: remove custom ->func handlers
io_uring: don't derive close state from ->func
io_uring: use kvfree() in io_sqe_buffer_register()
io_uring: validate the full range of provided buffers for access
io_uring: re-set iov base/len for buffer select retry
io_uring: move send/recv IOPOLL check into prep
io_uring: deduplicate io_openat{,2}_prep()
io_uring: do build_open_how() only once
io_uring: fix {SQ,IO}POLL with unsupported opcodes
io_uring: disallow close of ring itself
Linus Torvalds [Thu, 11 Jun 2020 23:07:33 +0000 (16:07 -0700)]
Merge tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Some followup fixes for this merge window. In particular:
- Seqcount write missing preemption disable for stats (Ahmed)
- blktrace fixes (Chaitanya)
- Redundant initializations (Colin)
- Various small NVMe fixes (Chaitanya, Christoph, Daniel, Max,
Niklas, Rikard)
- loop flag bug regression fix (Martijn)
- blk-mq tagging fixes (Christoph, Ming)"
* tag 'block-5.8-2020-06-11' of git://git.kernel.dk/linux-block:
umem: remove redundant initialization of variable ret
pktcdvd: remove redundant initialization of variable ret
nvmet: fail outstanding host posted AEN req
nvme-pci: use simple suspend when a HMB is enabled
nvme-fc: don't call nvme_cleanup_cmd() for AENs
nvmet-tcp: constify nvmet_tcp_ops
nvme-tcp: constify nvme_tcp_mq_ops and nvme_tcp_admin_mq_ops
nvme: do not call del_gendisk() on a disk that was never added
blk-mq: fix blk_mq_all_tag_iter
blk-mq: split out a __blk_mq_get_driver_tag helper
blktrace: fix endianness for blk_log_remap()
blktrace: fix endianness in get_pdu_int()
blktrace: use errno instead of bi_status
block: nr_sects_write(): Disable preemption on seqcount write
block: remove the error argument to the block_bio_complete tracepoint
loop: Fix wrong masking of status flags
block/bio-integrity: don't free 'buf' if bio_integrity_add_page() failed
David Howells [Thu, 11 Jun 2020 20:50:24 +0000 (21:50 +0100)]
afs: Fix afs_store_data() to set mtime in new operation descriptor
Fix afs_store_data() so that it sets the mtime in the new operation
descriptor otherwise the mtime on the server gets set to 0 when a write is
stored to the server.
Fixes: 590fdde7416c ("afs: Build an abstraction around an "operation" concept") Reported-by: Dave Botsch <botsch@cnf.cornell.edu> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 11 Jun 2020 22:54:31 +0000 (15:54 -0700)]
Merge tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Thomas Gleixner:
"A set of fixes and updates for x86:
- Unbreak paravirt VDSO clocks.
While the VDSO code was moved into lib for sharing a subtle check
for the validity of paravirt clocks got replaced. While the
replacement works perfectly fine for bare metal as the update of
the VDSO clock mode is synchronous, it fails for paravirt clocks
because the hypervisor can invalidate them asynchronously.
Bring it back as an optional function so it does not inflict this
on architectures which are free of PV damage.
- Fix the jiffies to jiffies64 mapping on 64bit so it does not
trigger an ODR violation on newer compilers
- Three fixes for the SSBD and *IB* speculation mitigation maze to
ensure consistency, not disabling of some *IB* variants wrongly and
to prevent a rogue cross process shutdown of SSBD. All marked for
stable.
- Add yet more CPU models to the splitlock detection capable list
!@#%$!
- Bring the pr_info() back which tells that TSC deadline timer is
enabled.
- Reboot quirk for MacBook6,1"
* tag 'x86-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Unbreak paravirt VDSO clocks
lib/vdso: Provide sanity check for cycles (again)
clocksource: Remove obsolete ifdef
x86_64: Fix jiffies ODR violation
x86/speculation: PR_SPEC_FORCE_DISABLE enforcement for indirect branches.
x86/speculation: Prevent rogue cross-process SSBD shutdown
x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.
x86/cpu: Add Sapphire Rapids CPU model number
x86/split_lock: Add Icelake microserver and Tigerlake CPU models
x86/apic: Make TSC deadline timer detection message visible
x86/reboot/quirks: Add MacBook6,1 reboot quirk
Linus Torvalds [Thu, 11 Jun 2020 22:36:28 +0000 (15:36 -0700)]
Merge tag 'timers-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A small fix for the VDSO code to force inline
__cvdso_clock_gettime_common() so the compiler
can't generate horrible code"
* tag 'timers-urgent-2020-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lib/vdso: Force inlining of __cvdso_clock_gettime_common()
Linus Torvalds [Thu, 11 Jun 2020 20:25:53 +0000 (13:25 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge some more updates from Andrew Morton:
- various hotfixes and minor things
- hch's use_mm/unuse_mm clearnups
Subsystems affected by this patch series: mm/hugetlb, scripts, kcov,
lib, nilfs, checkpatch, lib, mm/debug, ocfs2, lib, misc.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
kernel: set USER_DS in kthread_use_mm
kernel: better document the use_mm/unuse_mm API contract
kernel: move use_mm/unuse_mm to kthread.c
kernel: move use_mm/unuse_mm to kthread.c
stacktrace: cleanup inconsistent variable type
lib: test get_count_order/long in test_bitops.c
mm: add comments on pglist_data zones
ocfs2: fix spelling mistake and grammar
mm/debug_vm_pgtable: fix kernel crash by checking for THP support
lib: fix bitmap_parse() on 64-bit big endian archs
checkpatch: correct check for kernel parameters doc
nilfs2: fix null pointer dereference at nilfs_segctor_do_construct()
lib/lz4/lz4_decompress.c: document deliberate use of `&'
kcov: check kcov_softirq in kcov_remote_stop()
scripts/spelling: add a few more typos
khugepaged: selftests: fix timeout condition in wait_for_scan()
Rob Herring [Thu, 11 Jun 2020 14:58:04 +0000 (08:58 -0600)]
dt-bindings: Fix more incorrect 'reg' property sizes in examples
The examples template is a 'simple-bus' with a size of 1 cell for
had between 2 and 4 cells which really only errors on I2C or SPI type
devices with a single cell.
The easiest fix in most cases is to change the 'reg' property to 1 cell
for address and size.
Cc: "Heiko Stübner" <heiko@sntech.de> Cc: Ezequiel Garcia <ezequiel@collabora.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Richard Weinberger <richard@nod.at> Cc: "David S. Miller" <davem@davemloft.net> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Kishon Vijay Abraham I <kishon@ti.com> Cc: Vinod Koul <vkoul@kernel.org> Cc: Liam Girdwood <lgirdwood@gmail.com> Cc: linux-rockchip@lists.infradead.org Cc: linux-media@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: netdev@vger.kernel.org Cc: alsa-devel@alsa-project.org Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org>
Joerg Roedel [Thu, 11 Jun 2020 09:11:39 +0000 (11:11 +0200)]
alpha: Fix build around srm_sysrq_reboot_op
The patch introducing the struct was probably never compile tested,
because it sets a handler with a wrong function signature. Wrap the
handler into a functions with the correct signature to fix the build.
Fixes: 866e565ae70f ("tty/sysrq: alpha: export and use __sysrq_get_key_op()") Cc: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 11 Jun 2020 19:55:20 +0000 (12:55 -0700)]
Merge tag 'riscv-for-linus-5.8-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Kconfig select statements are now sorted alphanumerically
- first-level interrupts are now handled via a full irqchip driver
- CPU hotplug is fixed
- vDSO calls now use the common vDSO infrastructure
* tag 'riscv-for-linus-5.8-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: set the permission of vdso_data to read-only
riscv: use vDSO common flow to reduce the latency of the time-related functions
riscv: fix build warning of missing prototypes
RISC-V: Don't mark init section as non-executable
RISC-V: Force select RISCV_INTC for CONFIG_RISCV
RISC-V: Remove do_IRQ() function
clocksource/drivers/timer-riscv: Use per-CPU timer interrupt
irqchip: RISC-V per-HART local interrupt controller driver
RISC-V: Rename and move plic_find_hart_id() to arch directory
RISC-V: self-contained IPI handling routine
RISC-V: Sort select statements alphanumerically
- Fix incorrect mask in HiSilicon L3C perf PMU driver
- Fix compat vDSO compilation under some toolchain configurations
- Fix false UBSAN warning from ACPI IORT parsing code
- Fix booting under bootloaders that ignore TEXT_OFFSET
- Annotate debug initcall function with '__init'"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: warn on incorrect placement of the kernel by the bootloader
arm64: acpi: fix UBSAN warning
arm64: vdso32: add CONFIG_THUMB2_COMPAT_VDSO
drivers/perf: hisi: Fix wrong value for all counters enable
arm64: ftrace: Change CONFIG_FTRACE_WITH_REGS to CONFIG_DYNAMIC_FTRACE_WITH_REGS
arm64: debug: mark a function as __init to save some memory
scs: Report SCS usage in bytes rather than number of entries
Linus Torvalds [Thu, 11 Jun 2020 19:50:54 +0000 (12:50 -0700)]
Merge tag 'm68knommu-for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu updates from Greg Ungerer:
- casting clean up in the user access macros
- memory leak on error case fix for PCI probing
- update of a defconfig
* tag 'm68knommu-for-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k,nommu: fix implicit cast from __user in __{get,put}_user_asm()
m68k,nommu: add missing __user in uaccess' __ptr() macro
m68k: Drop CONFIG_MTD_M25P80 in stmark2_defconfig
m68k/PCI: Fix a memory leak in an error handling path
Rob Herring [Thu, 11 Jun 2020 14:52:38 +0000 (08:52 -0600)]
dt-bindings: phy: qcom: Fix missing 'ranges' and example addresses
The QCom QMP PHY bindings have child nodes with translatable (MMIO)
addresses, so a 'ranges' property is required in the parent node.
Additionally, the examples default to 1 address and size cell, so let's
fix that, too.
Fixes: 4f8cd3f42483 ("dt-bindings: phy: qcom,qmp: Convert QMP PHY bindings to yaml") Cc: Andy Gross <agross@kernel.org> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: Kishon Vijay Abraham I <kishon@ti.com> Cc: Vinod Koul <vkoul@kernel.org> Cc: Manu Gautam <mgautam@codeaurora.org> Cc: linux-arm-msm@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
Rob Herring [Wed, 10 Jun 2020 21:49:12 +0000 (15:49 -0600)]
dt-bindings: Remove more cases of 'allOf' containing a '$ref'
Another round of 'allOf' removals that came in this cycle.
json-schema versions draft7 and earlier have a weird behavior in that
any keywords combined with a '$ref' are ignored (silently). The correct
form was to put a '$ref' under an 'allOf'. This behavior is now changed
in the 2019-09 json-schema spec and '$ref' can be mixed with other
keywords. The json-schema library doesn't yet support this, but the
tooling now does a fixup for this and either way works.
This has been a constant source of review comments, so let's change this
treewide so everyone copies the simpler syntax.
Linus Torvalds [Thu, 11 Jun 2020 19:42:14 +0000 (12:42 -0700)]
Merge tag 'mailbox-v5.8' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
"qcom:
- new controller driver for IPCC
- reorg the of_device data
- add support for ipq6018 platform
spreadtrum:
- new sprd controller driver
imx:
- implement suspend/resume PM support
misc:
- make pcc driver struct static
- fix return value in imx_mu_scu
- disable clock before bailout in imx probe
- remove duplicate error mssg in zynqmp probe
- fix header size in imx.scu
- check for null instead of is-err in zynqmp"
* tag 'mailbox-v5.8' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: qcom: Add ipq6018 apcs compatible
mailbox: qcom: Add clock driver name in apcs mailbox driver data
dt-bindings: mailbox: Add YAML schemas for QCOM APCS global block
mailbox: imx: ONLY IPC MU needs IRQF_NO_SUSPEND flag
mailbox: imx: Add runtime PM callback to handle MU clocks
mailbox: imx: Add context save/restore for suspend/resume
MAINTAINERS: Add entry for Qualcomm IPCC driver
mailbox: Add support for Qualcomm IPCC
dt-bindings: mailbox: Add devicetree binding for Qcom IPCC
mailbox: zynqmp-ipi: Fix NULL vs IS_ERR() check in zynqmp_ipi_mbox_probe()
mailbox: imx-mailbox: fix scu msg header size check
mailbox: sprd: Add Spreadtrum mailbox driver
dt-bindings: mailbox: Add the Spreadtrum mailbox documentation
mailbox: ZynqMP IPI: Delete an error message in zynqmp_ipi_probe()
mailbox: imx: Disable the clock on devm_mbox_controller_register() failure
mailbox: imx: Fix return in imx_mu_scu_xlate()
mailbox: imx: Support runtime PM
mailbox: pcc: make pcc_mbox_driver static
Linus Torvalds [Thu, 11 Jun 2020 19:38:11 +0000 (12:38 -0700)]
Merge tag 'sound-fix-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are last-minute fixes gathered before merge window close; a few
fixes are for the core while the rest majority are driver fixes.
- PCM locking annotation fixes and the possible self-lock fix
- ASoC DPCM regression fixes with multi-CPU DAI
- A fix for inconsistent resume from system-PM on USB-audio
- Improved runtime-PM handling with multiple USB interfaces
- Quirks for HD-audio and USB-audio
- Hardened firmware handling in max98390 codec
- A couple of fixes for meson"
* tag 'sound-fix-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ASoC: rt5645: Add platform-data for Asus T101HA
ASoC: Intel: bytcr_rt5640: Add quirk for Toshiba Encore WT10-A tablet
ASoC: SOF: nocodec: conditionally set dpcm_capture/dpcm_playback flags
ASoC: Intel: boards: replace capture_only by dpcm_capture
ASoC: core: only convert non DPCM link to DPCM link
ASoC: soc-pcm: dpcm: fix playback/capture checks
ASoC: meson: add missing free_irq() in error path
ALSA: pcm: disallow linking stream to itself
ALSA: usb-audio: Manage auto-pm of all bundled interfaces
ALSA: hda/realtek - add a pintbl quirk for several Lenovo machines
ALSA: pcm: fix snd_pcm_link() lockdep splat
ALSA: usb-audio: Use the new macro for HP Dock rename quirks
ALSA: usb-audio: Add vendor, product and profile name for HP Thunderbolt Dock
ALSA: emu10k1: delete an unnecessary condition
dt-bindings: ASoc: Fix tdm-slot documentation spelling error
ASoC: meson: fix memory leak of links if allocation of ldata fails
ALSA: usb-audio: Fix inconsistent card PM state after resume
ASoC: max98390: Fix potential crash during param fw loading
ASoC: max98390: Fix incorrect printf qualifier
ASoC: fsl-asoc-card: Defer probe when fail to find codec device
...
Linus Torvalds [Thu, 11 Jun 2020 19:27:06 +0000 (12:27 -0700)]
Merge tag 'drm-next-2020-06-11-1' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"One sun4i fix and a connector hotplug race The ast fix is for a
regression in 5.6, and one of the i915 ones fixes an oops reported by
dhowells.
core:
- fix race in connectors sending hotplug
i915:
- Avoid use after free in cmdparser
- Avoid NULL dereference when probing all display encoders
- Fixup to module parameter type
sun4i:
- clock divider fix
ast:
- 24/32 bpp mode setting fix"
* tag 'drm-next-2020-06-11-1' of git://anongit.freedesktop.org/drm/drm:
drm/ast: fix missing break in switch statement for format->cpp[0] case 4
drm/sun4i: hdmi ddc clk: Fix size of m divider
drm/i915/display: Only query DP state of a DDI encoder
drm/i915/params: fix i915.reset module param type
drm/i915/gem: Mark the buffer pool as active for the cmdparser
drm/connector: notify userspace on hotplug after register complete
Linus Torvalds [Thu, 11 Jun 2020 19:22:41 +0000 (12:22 -0700)]
Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New features and improvements:
- Sunrpc receive buffer sizes only change when establishing a GSS credentials
- Add more sunrpc tracepoints
- Improve on tracepoints to capture internal NFS I/O errors
Other bugfixes and cleanups:
- Move a dprintk() to after a call to nfs_alloc_fattr()
- Fix off-by-one issues in rpc_ntop6
- Fix a few coccicheck warnings
- Use the correct SPDX license identifiers
- Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
- Replace zero-length array with flexible array
- Remove duplicate headers
- Set invalid blocks after NFSv4 writes to update space_used attribute
- Fix direct WRITE throughput regression"
* tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
NFS: Fix direct WRITE throughput regression
SUNRPC: rpc_xprt lifetime events should record xprt->state
xprtrdma: Make xprt_rdma_slot_table_entries static
nfs: set invalid blocks after NFSv4 writes
NFS: remove redundant initialization of variable result
sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs
NFS: Add a tracepoint in nfs_set_pgio_error()
NFS: Trace short NFS READs
NFS: nfs_xdr_status should record the procedure name
SUNRPC: Set SOFTCONN when destroying GSS contexts
SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
SUNRPC: trace RPC client lifetime events
SUNRPC: Trace transport lifetime events
SUNRPC: Split the xdr_buf event class
SUNRPC: Add tracepoint to rpc_call_rpcerror()
SUNRPC: Update the RPC_SHOW_SOCKET() macro
SUNRPC: Update the rpc_show_task_flags() macro
SUNRPC: Trace GSS context lifetimes
SUNRPC: receive buffer size estimation values almost never change
...
Marco Elver [Thu, 21 May 2020 14:20:45 +0000 (16:20 +0200)]
compiler.h: Avoid nested statement expression in data_race()
It appears that compilers have trouble with nested statement
expressions. Therefore, remove one level of statement expression nesting
from the data_race() macro. This will help avoiding potential problems
in the future as its usage increases.
Reported-by: Borislav Petkov <bp@suse.de> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lkml.kernel.org/r/20200520221712.GA21166@zn.tnic Link: https://lkml.kernel.org/r/20200521142047.169334-10-elver@google.com
Marco Elver [Thu, 21 May 2020 14:20:44 +0000 (16:20 +0200)]
compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE()
The volatile accesses no longer need to be wrapped in data_race()
because compilers that emit instrumentation distinguishing volatile
accesses are required for KCSAN.
Consequently, the explicit kcsan_check_atomic*() are no longer required
either since the compiler emits instrumentation distinguishing the
volatile accesses.
Finally, simplify __READ_ONCE_SCALAR() and remove __WRITE_ONCE_SCALAR().
[ bp: Convert commit message to passive voice. ]
Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/20200521142047.169334-9-elver@google.com
Marco Elver [Thu, 21 May 2020 14:20:41 +0000 (16:20 +0200)]
kcsan: Remove 'noinline' from __no_kcsan_or_inline
Some compilers incorrectly inline small __no_kcsan functions, which then
results in instrumenting the accesses. For this reason, the 'noinline'
attribute was added to __no_kcsan_or_inline. All known versions of GCC
are affected by this. Supported versions of Clang are unaffected, and
never inline a no_sanitize function.
However, the attribute 'noinline' in __no_kcsan_or_inline causes
unexpected code generation in functions that are __no_kcsan and call a
__no_kcsan_or_inline function.
In certain situations it is expected that the __no_kcsan_or_inline
function is actually inlined by the __no_kcsan function, and *no* calls
are emitted. By removing the 'noinline' attribute, give the compiler
the ability to inline and generate the expected code in __no_kcsan
functions.
Marco Elver [Thu, 21 May 2020 14:20:40 +0000 (16:20 +0200)]
kcsan: Pass option tsan-instrument-read-before-write to Clang
Clang (unlike GCC) removes reads before writes with matching addresses
in the same basic block. This is an optimization for TSAN, since writes
will always cause conflict if the preceding read would have.
However, for KCSAN we cannot rely on this option, because we apply
several special rules to writes, in particular when the
KCSAN_ASSUME_PLAIN_WRITES_ATOMIC option is selected. To avoid missing
potential data races, pass the -tsan-instrument-read-before-write option
to Clang if it is available [1].
Marco Elver [Thu, 21 May 2020 14:20:39 +0000 (16:20 +0200)]
kcsan: Support distinguishing volatile accesses
In the kernel, the "volatile" keyword is used in various concurrent
contexts, whether in low-level synchronization primitives or for
legacy reasons. If supported by the compiler, it will be assumed
that aligned volatile accesses up to sizeof(long long) (matching
compiletime_assert_rwonce_type()) are atomic.
Recent versions of Clang [1] (GCC tentative [2]) can instrument
volatile accesses differently. Add the option (required) to enable the
instrumentation, and provide the necessary runtime functions. None of
the updated compilers are widely available yet (Clang 11 will be the
first release to support the feature).
Marco Elver [Thu, 21 May 2020 14:20:42 +0000 (16:20 +0200)]
kcsan: Restrict supported compilers
The first version of Clang that supports -tsan-distinguish-volatile will
be able to support KCSAN. The first Clang release to do so, will be
Clang 11. This is due to satisfying all the following requirements:
1. Never emit calls to __tsan_func_{entry,exit}.
2. __no_kcsan functions should not call anything, not even
kcsan_{enable,disable}_current(), when using __{READ,WRITE}_ONCE => Requires
leaving them plain!
3. Support atomic_{read,set}*() with KCSAN, which rely on
arch_atomic_{read,set}*() using __{READ,WRITE}_ONCE() => Because of
#2, rely on Clang 11's -tsan-distinguish-volatile support. We will
double-instrument atomic_{read,set}*(), but that's reasonable given
it's still lower cost than the data_race() variant due to avoiding 2
extra calls (kcsan_{en,dis}able_current() calls).
4. __always_inline functions inlined into __no_kcsan functions are never
instrumented.
5. __always_inline functions inlined into instrumented functions are
instrumented.
6. __no_kcsan_or_inline functions may be inlined into __no_kcsan functions =>
Implies leaving 'noinline' off of __no_kcsan_or_inline.
7. Because of #6, __no_kcsan and __no_kcsan_or_inline functions should never be
spuriously inlined into instrumented functions, causing the accesses of the
__no_kcsan function to be instrumented.
Older versions of Clang do not satisfy #3. The latest GCC currently
doesn't support at least #1, #3, and #7.
Marco Elver [Thu, 21 May 2020 14:20:38 +0000 (16:20 +0200)]
kcsan: Avoid inserting __tsan_func_entry/exit if possible
To avoid inserting __tsan_func_{entry,exit}, add option if supported by
compiler. Currently only Clang can be told to not emit calls to these
functions. It is safe to not emit these, since KCSAN does not rely on
them.
Note that, if we disable __tsan_func_{entry,exit}(), we need to disable
tail-call optimization in sanitized compilation units, as otherwise we
may skip frames in the stack trace; in particular when the tail called
function is one of the KCSAN's runtime functions, and a report is
generated, we might miss the function where the actual access occurred.
Since __tsan_func_{entry,exit}() insertion effectively disabled
tail-call optimization, there should be no observable change.
This was caught and confirmed with kcsan-test & UNWINDER_ORC.
Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200521142047.169334-3-elver@google.com
Thomas Gleixner [Thu, 11 Jun 2020 18:02:46 +0000 (20:02 +0200)]
Rebase locking/kcsan to locking/urgent
Merge the state of the locking kcsan branch before the read/write_once()
and the atomics modifications got merged.
Squash the fallout of the rebase on top of the read/write once and atomic
fallback work into the merge. The history of the original branch is
preserved in tag locking-kcsan-2020-06-02.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Paolo Bonzini [Thu, 11 Jun 2020 18:02:32 +0000 (14:02 -0400)]
Merge tag 'kvmarm-fixes-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for Linux 5.8, take #1
* 32bit VM fixes:
- Fix embarassing mapping issue between AArch32 CSSELR and AArch64
ACTLR
- Add ACTLR2 support for AArch32
- Get rid of the useless ACTLR_EL1 save/restore
- Fix CP14/15 accesses for AArch32 guests on BE hosts
- Ensure that we don't loose any state when injecting a 32bit
exception when running on a VHE host
* 64bit VM fixes:
- Fix PtrAuth host saving happening in preemptible contexts
- Optimize PtrAuth lazy enable
- Drop vcpu to cpu context pointer
- Fix sparse warnings for HYP per-CPU accesses
Paolo Bonzini [Thu, 11 Jun 2020 18:01:51 +0000 (14:01 -0400)]
KVM: x86: do not pass poisoned hva to __kvm_set_memory_region
__kvm_set_memory_region does not use the hva at all, so trying to
catch use-after-delete is pointless and, worse, it fails access_ok
now that we apply it to all memslots including private kernel ones.
This fixes an AVIC regression.
Fixes: eafc16c93920 ("KVM: check userspace_addr for all memslots") Reported-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linus Torvalds [Thu, 11 Jun 2020 17:48:12 +0000 (10:48 -0700)]
Merge tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull DAX updates part three from Darrick Wong:
"Now that the xfs changes have landed, this third piece changes the
FS_XFLAG_DAX ioctl code in xfs to request that the inode be reloaded
after the last program closes the file, if doing so would make a S_DAX
change happen. The goal here is to make dax access mode switching
quicker when possible.
Summary:
- Teach XFS to ask the VFS to drop an inode if the administrator
changes the FS_XFLAG_DAX inode flag such that the S_DAX state would
change. This can result in files changing access modes without
requiring an unmount cycle"
* tag 'vfs-5.8-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
fs/xfs: Combine xfs_diflags_to_linux() and xfs_diflags_to_iflags()
fs/xfs: Create function xfs_inode_should_enable_dax()
fs/xfs: Make DAX mount option a tri-state
fs/xfs: Change XFS_MOUNT_DAX to XFS_MOUNT_DAX_ALWAYS
fs/xfs: Remove unnecessary initialization of i_rwsem
Chuck Lever [Fri, 29 May 2020 18:14:40 +0000 (14:14 -0400)]
NFS: Fix direct WRITE throughput regression
I measured a 50% throughput regression for large direct writes.
The observed on-the-wire behavior is that the client sends every
NFS WRITE twice: once as an UNSTABLE WRITE plus a COMMIT, and once
as a FILE_SYNC WRITE.
This is because the nfs_write_match_verf() check in
nfs_direct_commit_complete() fails for every WRITE.
Buffered writes use nfs_write_completion(), which sets req->wb_verf
correctly. Direct writes use nfs_direct_write_completion(), which
does not set req->wb_verf at all. This leaves req->wb_verf set to
all zeroes for every direct WRITE, and thus
nfs_direct_commit_completion() always sets NFS_ODIRECT_RESCHED_WRITES.
This fix appears to restore nearly all of the lost performance.
Fixes: c9b255e88761 ("NFS: Fix O_DIRECT commit verifier handling") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Zou Wei [Thu, 23 Apr 2020 07:10:02 +0000 (15:10 +0800)]
xprtrdma: Make xprt_rdma_slot_table_entries static
Fix the following sparse warning:
net/sunrpc/xprtrdma/transport.c:71:14: warning: symbol 'xprt_rdma_slot_table_entries'
was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Zheng Bin [Thu, 21 May 2020 09:17:21 +0000 (17:17 +0800)]
nfs: set invalid blocks after NFSv4 writes
Use the following command to test nfsv4(size of file1M is 1MB):
mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt
cp file1M /mnt
du -h /mnt/file1M -->0 within 60s, then 1M
When write is done(cp file1M /mnt), will call this:
nfs_writeback_done
nfs4_write_done
nfs4_write_done_cb
nfs_writeback_update_inode
nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime
nfs_post_op_update_inode_force_wcc_locked
nfs_set_cache_invalid
nfs_refresh_inode_locked
nfs_update_inode
nfsd write response contains change, ctime, mtime, the flag will be
clear after nfs_update_inode. Howerver, write response does not contain
space_used, previous open response contains space_used whose value is 0,
so inode->i_blocks is still 0.
nfs_getattr -->called by "du -h"
do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s
cache_validity = READ_ONCE(NFS_I(inode)->cache_validity)
do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false
if (do_update) {
__nfs_revalidate_inode
}
Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M"
is 0.
Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done.
Fixes: 39b2229a9f68 ("NFS: More fine grained attribute tracking") Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Colin Ian King [Wed, 27 May 2020 12:56:11 +0000 (13:56 +0100)]
NFS: remove redundant initialization of variable result
The variable result is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 12 May 2020 21:14:05 +0000 (17:14 -0400)]
NFS: Trace short NFS READs
A short read can generate an -EIO error without there being an error
on the wire. This tracepoint acts as an eyecatcher when there is no
obvious I/O error.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit c04f0e0ce679 ("NFS/NFSD/SUNRPC: replace generic creds with
'struct cred'.") made rpc_call_null_helper() set RPC_TASK_NULLCREDS
unconditionally. Therefore there's no need for
rpc_call_null_helper()'s call sites to set RPC_TASK_NULLCREDS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 12 May 2020 21:13:39 +0000 (17:13 -0400)]
SUNRPC: trace RPC client lifetime events
The "create" tracepoint records parts of the rpc_create arguments,
and the shutdown tracepoint records when the rpc_clnt is about to
signal pending tasks and destroy auths.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 12 May 2020 21:13:28 +0000 (17:13 -0400)]
SUNRPC: Split the xdr_buf event class
To help tie the recorded xdr_buf to a particular RPC transaction,
the client side version of this class should display task ID
information and the server side one should show the request's XID.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 12 May 2020 21:13:01 +0000 (17:13 -0400)]
SUNRPC: receive buffer size estimation values almost never change
Avoid unnecessary cache sloshing by placing the buffer size
estimation update logic behind an atomic bit flag.
The size of GSS information included in each wrapped Reply does
not change during the lifetime of a GSS context. Therefore, the
au_rslack and au_ralign fields need to be updated only once after
establishing a fresh GSS credential.
Thus a slack size update must occur after a cred is created,
duplicated, renewed, or expires. I'm not sure I have this exactly
right. A trace point is introduced to track updates to these
variables to enable troubleshooting the problem if I missed a spot.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Vitaly Kuznetsov [Wed, 10 Jun 2020 16:41:16 +0000 (18:41 +0200)]
KVM: selftests: fix sync_with_host() in smm_test
It was reported that older GCCs compile smm_test in a way that breaks
it completely:
kvm_exit: reason EXIT_CPUID rip 0x4014db info 0 0
func 7ffffffd idx 830 rax 0 rbx 0 rcx 0 rdx 0, cpuid entry not found
...
kvm_exit: reason EXIT_MSR rip 0x40abd9 info 0 0
kvm_msr: msr_read 487 = 0x0 (#GP)
...
Note, '7ffffffd' was supposed to be '80000001' as we're checking for
SVM. Dropping '-O2' from compiler flags help. Turns out, asm block in
sync_with_host() is wrong. We us 'in 0xe, %%al' instruction to sync
with the host and in 'AL' register we actually pass the parameter
(stage) but after sync 'AL' gets written to but GCC thinks the value
is still there and uses it to compute 'EAX' for 'cpuid'.
smm_test can't fully use standard ucall() framework as we need to
write a very simple SMI handler there. Fix the immediate issue by
making RAX input/output operand. While on it, make sync_with_host()
static inline.
Reported-by: Marcelo Bandeira Condotta <mcondotta@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610164116.770811-1-vkuznets@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>