]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
4 years agoarm64: mm: add p?d_leaf() definitions
Steven Price [Tue, 4 Feb 2020 01:35:14 +0000 (17:35 -0800)]
arm64: mm: add p?d_leaf() definitions

walk_page_range() is going to be allowed to walk page tables other than
those of user space.  For this it needs to know when it has reached a
'leaf' entry in the page tables.  This information will be provided by the
p?d_leaf() functions/macros.

For arm64, we already have p?d_sect() macros which we can reuse for
p?d_leaf().

pud_sect() is defined as a dummy function when CONFIG_PGTABLE_LEVELS < 3
or CONFIG_ARM64_64K_PAGES is defined.  However when the kernel is
configured this way then architecturally it isn't allowed to have a large
page at this level, and any code using these page walking macros is
implicitly relying on the page size/number of levels being the same as the
kernel.  So it is safe to reuse this for p?d_leaf() as it is an
architectural restriction.

Link: http://lkml.kernel.org/r/20191218162402.45610-5-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoarm: mm: add p?d_leaf() definitions
Steven Price [Tue, 4 Feb 2020 01:35:10 +0000 (17:35 -0800)]
arm: mm: add p?d_leaf() definitions

walk_page_range() is going to be allowed to walk page tables other than
those of user space.  For this it needs to know when it has reached a
'leaf' entry in the page tables.  This information is provided by the
p?d_leaf() functions/macros.

For arm pmd_large() already exists and does what we want.  So simply
provide the generic pmd_leaf() name.

Link: http://lkml.kernel.org/r/20191218162402.45610-4-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoarc: mm: add p?d_leaf() definitions
Steven Price [Tue, 4 Feb 2020 01:35:06 +0000 (17:35 -0800)]
arc: mm: add p?d_leaf() definitions

walk_page_range() is going to be allowed to walk page tables other than
those of user space.  For this it needs to know when it has reached a
'leaf' entry in the page tables.  This information will be provided by the
p?d_leaf() functions/macros.

For arc, we only have two levels, so only pmd_leaf() is needed.

Link: http://lkml.kernel.org/r/20191218162402.45610-3-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: add generic p?d_leaf() macros
Steven Price [Tue, 4 Feb 2020 01:35:01 +0000 (17:35 -0800)]
mm: add generic p?d_leaf() macros

Patch series "Generic page walk and ptdump", v17.

Many architectures current have a debugfs file for dumping the kernel page
tables.  Currently each architecture has to implement custom functions for
this because the details of walking the page tables used by the kernel are
different between architectures.

This series extends the capabilities of walk_page_range() so that it can
deal with the page tables of the kernel (which have no VMAs and can
contain larger huge pages than exist for user space).  A generic PTDUMP
implementation is the implemented making use of the new functionality of
walk_page_range() and finally arm64 and x86 are switch to using it,
removing the custom table walkers.

To enable a generic page table walker to walk the unusual mappings of the
kernel we need to implement a set of functions which let us know when the
walker has reached the leaf entry.  After a suggestion from Will Deacon
I've chosen the name p?d_leaf() as this (hopefully) describes the purpose
(and is a new name so has no historic baggage).  Some architectures have
p?d_large macros but this is easily confused with "large pages".

This series ends with a generic PTDUMP implemention for arm64 and x86.

Mostly this is a clean up and there should be very little functional
change.  The exceptions are:

* arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).

* arm64 has the ability to efficiently process KASAN pages (which
  previously only x86 implemented).  This means that the combination of
  KASAN and DEBUG_WX is now useable.

This patch (of 23):

Exposing the pud/pgd levels of the page tables to walk_page_range() means
we may come across the exotic large mappings that come with large areas of
contiguous memory (such as the kernel's linear map).

For architectures that don't provide all p?d_leaf() macros, provide
generic do nothing default that are suitable where there cannot be leaf
pages at that level.  Futher patches will add implementations for
individual architectures.

The name p?d_leaf() is chosen to minimize the confusion with existing uses
of "large" pages and "huge" pages which do not necessary mean that the
entry is a leaf (for example it may be a set of contiguous entries that
only take 1 TLB slot).  For the purpose of walking the page tables we
don't need to know how it will be represented in the TLB, but we do need
to know for sure if it is a leaf of the tree.

Link: http://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.com
Signed-off-by: Steven Price <steven.price@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Zong Li <zong.li@sifive.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: remove __krealloc
Florian Westphal [Tue, 4 Feb 2020 01:34:58 +0000 (17:34 -0800)]
mm: remove __krealloc

Since 5.5-rc1 the last user of this function is gone, so remove the
functionality.

See commit
0b331c716b85 ("netfilter: conntrack: free extension area immediately")
for details.

Link: http://lkml.kernel.org/r/20191212223442.22141-1-fw@strlen.de
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agopinctrl: fix pxa2xx.c build warnings
Randy Dunlap [Tue, 4 Feb 2020 01:34:55 +0000 (17:34 -0800)]
pinctrl: fix pxa2xx.c build warnings

Add #include of <linux/pinctrl/machine.h> to fix build
warnings in pinctrl-pxa2xx.c.  Fixes these warnings:

In file included from ../drivers/pinctrl/pxa/pinctrl-pxa2xx.c:24:0:
../drivers/pinctrl/pxa/../pinctrl-utils.h:36:8: warning: `enum pinctrl_map_type' declared inside parameter list [enabled by default]
   enum pinctrl_map_type type);
        ^
../drivers/pinctrl/pxa/../pinctrl-utils.h:36:8: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]

Link: http://lkml.kernel.org/r/0024542e-cba9-8f13-6c18-32d0050a6007@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodrivers/block/null_blk_main.c: fix uninitialized var warnings
Andrew Morton [Tue, 4 Feb 2020 01:34:52 +0000 (17:34 -0800)]
drivers/block/null_blk_main.c: fix uninitialized var warnings

With gcc-7.2, many instances of

drivers/block/null_blk_main.c: In function ‘nullb_device_zone_nr_conv_store’:
drivers/block/null_blk_main.c:291:12: warning: ‘new_value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  dev->NAME = new_value;      \
            ^
drivers/block/null_blk_main.c:279:7: note: ‘new_value’ was declared here
  TYPE new_value;       \
       ^

Presumably notabug, so use uninitialized_var() to suppress them.

Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodrivers/block/null_blk_main.c: fix layout
Andrew Morton [Tue, 4 Feb 2020 01:34:49 +0000 (17:34 -0800)]
drivers/block/null_blk_main.c: fix layout

Each line here overflows 80 cols by exactly one character.  Delete one tab
per line to fix.

Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoipc/msg.c: consolidate all xxxctl_down() functions
Lu Shuaibing [Tue, 4 Feb 2020 01:34:46 +0000 (17:34 -0800)]
ipc/msg.c: consolidate all xxxctl_down() functions

A use of uninitialized memory in msgctl_down() because msqid64 in
ksys_msgctl hasn't been initialized.  The local | msqid64 | is created in
ksys_msgctl() and then passed into msgctl_down().  Along the way msqid64
is never initialized before msgctl_down() checks msqid64->msg_qbytes.

KUMSAN(KernelUninitializedMemorySantizer, a new error detection tool)
reports:

==================================================================
BUG: KUMSAN: use of uninitialized memory in msgctl_down+0x94/0x300
Read of size 8 at addr ffff88806bb97eb8 by task syz-executor707/2022

CPU: 0 PID: 2022 Comm: syz-executor707 Not tainted 5.2.0-rc4+ #63
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 dump_stack+0x75/0xae
 __kumsan_report+0x17c/0x3e6
 kumsan_report+0xe/0x20
 msgctl_down+0x94/0x300
 ksys_msgctl.constprop.14+0xef/0x260
 do_syscall_64+0x7e/0x1f0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4400e9
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd869e0598 EFLAGS: 00000246 ORIG_RAX: 0000000000000047
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004400e9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000401970
R13: 0000000000401a00 R14: 0000000000000000 R15: 0000000000000000

The buggy address belongs to the page:
page:ffffea0001aee5c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x100000000000000()
raw: 0100000000000000 0000000000000000 ffffffff01ae0101 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kumsan: bad access detected
==================================================================

Syzkaller reproducer:
msgctl$IPC_RMID(0x0, 0x0)

C reproducer:
// autogenerated by syzkaller (https://github.com/google/syzkaller)

int main(void)
{
  syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
  syscall(__NR_msgctl, 0, 0, 0);
  return 0;
}

[natechancellor@gmail.com: adjust indentation in ksys_msgctl]
Link: https://github.com/ClangBuiltLinux/linux/issues/829
Link: http://lkml.kernel.org/r/20191218032932.37479-1-natechancellor@gmail.com
Link: http://lkml.kernel.org/r/20190613014044.24234-1-shuaibinglu@126.com
Signed-off-by: Lu Shuaibing <shuaibinglu@126.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: NeilBrown <neilb@suse.com>
From: Andrew Morton <akpm@linux-foundation.org>
Subject: drivers/block/null_blk_main.c: fix layout

Each line here overflows 80 cols by exactly one character.  Delete one tab
per line to fix.

Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoipc/sem.c: document and update memory barriers
Manfred Spraul [Tue, 4 Feb 2020 01:34:42 +0000 (17:34 -0800)]
ipc/sem.c: document and update memory barriers

Document and update the memory barriers in ipc/sem.c:

- Add smp_store_release() to wake_up_sem_queue_prepare() and
  document why it is needed.

- Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
  as the pair for the barrier inside wake_up_sem_queue_prepare().

- Add comments to all barriers, and mention the rules in the block
  regarding locking.

- Switch to using wake_q_add_safe().

Link: http://lkml.kernel.org/r/20191020123305.14715-6-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoipc/msg.c: update and document memory barriers
Manfred Spraul [Tue, 4 Feb 2020 01:34:39 +0000 (17:34 -0800)]
ipc/msg.c: update and document memory barriers

Transfer findings from ipc/mqueue.c:

- A control barrier was missing for the lockless receive case So in
  theory, not yet initialized data may have been copied to user space -
  obviously only for architectures where control barriers are not NOP.

- use smp_store_release().  In theory, the refount may have been
  decreased to 0 already when wake_q_add() tries to get a reference.

Link: http://lkml.kernel.org/r/20191020123305.14715-5-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoipc/mqueue.c: update/document memory barriers
Manfred Spraul [Tue, 4 Feb 2020 01:34:36 +0000 (17:34 -0800)]
ipc/mqueue.c: update/document memory barriers

Update and document memory barriers for mqueue.c:

- ewp->state is read without any locks, thus READ_ONCE is required.

- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
  acquire semantics if the value is STATE_READY.

- use wake_q_add_safe()

- document why __set_current_state() may be used:
  Reading task->state cannot happen before the wake_q_add() call,
  which happens while holding info->lock. Thus the spin_unlock()
  is the RELEASE, and the spin_lock() is the ACQUIRE.

For completeness: there is also a 3 CPU scenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE

Link: http://lkml.kernel.org/r/20191020123305.14715-4-manfred@colorfullife.com
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Waiman Long <longman@redhat.com>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoipc/mqueue.c: remove duplicated code
Davidlohr Bueso [Tue, 4 Feb 2020 01:34:32 +0000 (17:34 -0800)]
ipc/mqueue.c: remove duplicated code

pipelined_send() and pipelined_receive() are identical, so merge them.

[manfred@colorfullife.com: add changelog]
Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <1vier1@web.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agosmp_mb__{before,after}_atomic(): update Documentation
Manfred Spraul [Tue, 4 Feb 2020 01:34:29 +0000 (17:34 -0800)]
smp_mb__{before,after}_atomic(): update Documentation

When adding the _{acquire|release|relaxed}() variants of some atomic
operations, it was forgotten to update Documentation/memory_barrier.txt:

smp_mb__{before,after}_atomic() is now intended for all RMW operations
that do not imply a memory barrier.

1)
smp_mb__before_atomic();
atomic_add();

2)
smp_mb__before_atomic();
atomic_xchg_relaxed();

3)
smp_mb__before_atomic();
atomic_fetch_add_relaxed();

Invalid would be:
smp_mb__before_atomic();
atomic_set();

In addition, the patch splits the long sentence into multiple shorter
sentences.

Link: http://lkml.kernel.org/r/20191020123305.14715-2-manfred@colorfullife.com
Fixes: d96e349fe28c ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <1vier1@web.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone()
David Hildenbrand [Tue, 4 Feb 2020 01:34:26 +0000 (17:34 -0800)]
mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone()

The callers are only interested in the actual zone, they don't care about
boundaries.  Return the zone instead to simplify.

Link: http://lkml.kernel.org/r/20200110183308.11849-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: cleanup __remove_pages()
David Hildenbrand [Tue, 4 Feb 2020 01:34:23 +0000 (17:34 -0800)]
mm/memory_hotplug: cleanup __remove_pages()

Let's drop the basically unused section stuff and simplify.

Also, let's use a shorter variant to calculate the number of pages to
the next section boundary.

Link: http://lkml.kernel.org/r/20191006085646.5768-11-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: drop local variables in shrink_zone_span()
David Hildenbrand [Tue, 4 Feb 2020 01:34:19 +0000 (17:34 -0800)]
mm/memory_hotplug: drop local variables in shrink_zone_span()

Get rid of the unnecessary local variables.

Link: http://lkml.kernel.org/r/20191006085646.5768-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: don't check for "all holes" in shrink_zone_span()
David Hildenbrand [Tue, 4 Feb 2020 01:34:16 +0000 (17:34 -0800)]
mm/memory_hotplug: don't check for "all holes" in shrink_zone_span()

If we have holes, the holes will automatically get detected and removed
once we remove the next bigger/smaller section.  The extra checks can go.

Link: http://lkml.kernel.org/r/20191006085646.5768-9-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn
David Hildenbrand [Tue, 4 Feb 2020 01:34:12 +0000 (17:34 -0800)]
mm/memory_hotplug: we always have a zone in find_(smallest|biggest)_section_pfn

With shrink_pgdat_span() out of the way, we now always have a valid zone.

Link: http://lkml.kernel.org/r/20191006085646.5768-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()
David Hildenbrand [Tue, 4 Feb 2020 01:34:09 +0000 (17:34 -0800)]
mm/memory_hotplug: poison memmap in remove_pfn_range_from_zone()

Let's poison the pages similar to when adding new memory in
sparse_add_section().  Also call remove_pfn_range_from_zone() from
memunmap_pages(), so we can poison the memmap from there as well.

Link: http://lkml.kernel.org/r/20191006085646.5768-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/memmap_init: update variable name in memmap_init_zone
Aneesh Kumar K.V [Tue, 4 Feb 2020 01:34:06 +0000 (17:34 -0800)]
mm/memmap_init: update variable name in memmap_init_zone

Patch series "mm/memory_hotplug: Shrink zones before removing memory", v6.

This series fixes the access of uninitialized memmaps when shrinking
zones/nodes and when removing memory.  Also, it contains all fixes for
crashes that can be triggered when removing certain namespace using
memunmap_pages() - ZONE_DEVICE, reported by Aneesh.

We stop trying to shrink ZONE_DEVICE, as it's buggy, fixing it would be
more involved (we don't have SECTION_IS_ONLINE as an indicator), and
shrinking is only of limited use (set_zone_contiguous() cannot detect the
ZONE_DEVICE as contiguous).

We continue shrinking !ZONE_DEVICE zones, however, I reduced the amount of
code to a minimum.  Shrinking is especially necessary to keep
zone->contiguous set where possible, especially, on memory unplug of DIMMs
at zone boundaries.

--------------------------------------------------------------------------

Zones are now properly shrunk when offlining memory blocks or when
onlining failed.  This allows to properly shrink zones on memory unplug
even if the separate memory blocks of a DIMM were onlined to different
zones or re-onlined to a different zone after offlining.

Example:

:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  0
        present  0
        managed  0
:/# echo "online_movable" > /sys/devices/system/memory/memory41/state
:/# echo "online_movable" > /sys/devices/system/memory/memory43/state
:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  98304
        present  65536
        managed  65536
:/# echo 0 > /sys/devices/system/memory/memory43/online
:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  32768
        present  32768
        managed  32768
:/# echo 0 > /sys/devices/system/memory/memory41/online
:/# cat /proc/zoneinfo
Node 1, zone  Movable
        spanned  0
        present  0
        managed  0

This patch (of 6):

The third argument is actually number of pages.  Change the variable name
from size to nr_pages to indicate this better.

No functional change in this patch.

Link: http://lkml.kernel.org/r/20191006085646.5768-3-david@redhat.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm: factor out next_present_section_nr()
David Hildenbrand [Tue, 4 Feb 2020 01:34:02 +0000 (17:34 -0800)]
mm: factor out next_present_section_nr()

Let's move it to the header and use the shorter variant from
mm/page_alloc.c (the original one will also check
"__highest_present_section_nr + 1", which is not necessary).  While at
it, make the section_nr in next_pfn() const.

In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we
exceed __highest_present_section_nr, which doesn't make a difference in
the caller as it is big enough (>= all sane end_pfn).

Link: http://lkml.kernel.org/r/20200113144035.10848-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Jin, Zhi" <zhi.jin@intel.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_alloc: fix and rework pfn handling in memmap_init_zone()
David Hildenbrand [Tue, 4 Feb 2020 01:33:59 +0000 (17:33 -0800)]
mm/page_alloc: fix and rework pfn handling in memmap_init_zone()

Let's update the pfn manually whenever we continue the loop.  This makes
the code easier to read but also less error prone (and we can directly fix
one issue).

When overlap_memmap_init() returns true, pfn is updated to
"memblock_region_memory_end_pfn(r)".  So it already points at the *next*
pfn to process.  Incrementing the pfn another time is wrong, we might
leave one uninitialized.  I spotted this by inspecting the code, so I have
no idea if this is relevant in practise (with kernelcore=mirror).

Link: http://lkml.kernel.org/r/20200113144035.10848-2-david@redhat.com
Fixes: 6c09eeb39d99 ("mm: move mirrored memory specific code outside of memmap_init_zone")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Jin, Zhi" <zhi.jin@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_alloc.c: initialize memmap of unavailable memory directly
David Hildenbrand [Tue, 4 Feb 2020 01:33:55 +0000 (17:33 -0800)]
mm/page_alloc.c: initialize memmap of unavailable memory directly

Let's make sure that all memory holes are actually marked PageReserved(),
that page_to_pfn() produces reliable results, and that these pages are not
detected as "mmap" pages due to the mapcount.

E.g., booting a x86-64 QEMU guest with 4160 MB:

[    0.010585] Early memory node ranges
[    0.010586]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.010588]   node   0: [mem 0x0000000000100000-0x00000000bffdefff]
[    0.010589]   node   0: [mem 0x0000000100000000-0x0000000143ffffff]

max_pfn is 0x144000.

Before this change:

[root@localhost ~]# ./page-types -r -a 0x144000,
             flags      page-count       MB  symbolic-flags                     long-symbolic-flags
0x0000000000000800           16384       64  ___________M_______________________________        mmap
             total           16384       64

After this change:

[root@localhost ~]# ./page-types -r -a 0x144000,
             flags      page-count       MB  symbolic-flags                     long-symbolic-flags
0x0000000100000000           16384       64  ___________________________r_______________        reserved
             total           16384       64

IOW, especially the unavailable physical memory ("memory hole") in the
last section would not get properly marked PageReserved() and is indicated
to be "mmap" memory.

Drop the trace of that function from include/linux/mm.h - nobody else
needs it, and rename it accordingly.

Note: The fake zone/node might not be covered by the zone/node span.  This
is not an urgent issue (for now, we had the same node/zone due to the
zeroing).  We'll need a clean way to mark memory holes (e.g., using a page
type PageHole() if possible or a fake ZONE_INVALID) and eventually stop
marking these memory holes PageReserved().

Link: http://lkml.kernel.org/r/20191211163201.17179-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/proc/page.c: allow inspection of last section and fix end detection
David Hildenbrand [Tue, 4 Feb 2020 01:33:52 +0000 (17:33 -0800)]
fs/proc/page.c: allow inspection of last section and fix end detection

If max_pfn does not fall onto a section boundary, it is possible to
inspect PFNs up to max_pfn, and PFNs above max_pfn, however, max_pfn
itself can't be inspected.  We can have a valid (and online) memmap at and
above max_pfn if max_pfn is not aligned to a section boundary.  The whole
early section has a memmap and is marked online.  Being able to inspect
the state of these PFNs is valuable for debugging, especially because
max_pfn can change on memory hotplug and expose these memmaps.

Also, querying page flags via "./page-types -r -a 0x144001,"
(tools/vm/page-types.c) inside a x86-64 guest with 4160MB under QEMU
results in an (almost) endless loop in user space, because the end is not
detected properly when starting after max_pfn.

Instead, let's allow to inspect all pages in the highest section and
return 0 directly if we try to access pages above that section.

While at it, check the count before adjusting it, to avoid masking user
errors.

Link: http://lkml.kernel.org/r/20191211163201.17179-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agomm/page_alloc.c: fix uninitialized memmaps on a partially populated last section
David Hildenbrand [Tue, 4 Feb 2020 01:33:48 +0000 (17:33 -0800)]
mm/page_alloc.c: fix uninitialized memmaps on a partially populated last section

Patch series "mm: fix max_pfn not falling on section boundary", v2.

Playing with different memory sizes for a x86-64 guest, I discovered that
some memmaps (highest section if max_mem does not fall on the section
boundary) are marked as being valid and online, but contain garbage.  We
have to properly initialize these memmaps.

Looking at /proc/kpageflags and friends, I found some more issues,
partially related to this.

This patch (of 3):

If max_pfn is not aligned to a section boundary, we can easily run into
BUGs.  This can e.g., be triggered on x86-64 under QEMU by specifying a
memory size that is not a multiple of 128MB (e.g., 4097MB, but also
4160MB).  I was told that on real HW, we can easily have this scenario
(esp., one of the main reasons sub-section hotadd of devmem was added).

The issue is, that we have a valid memmap (pfn_valid()) for the whole
section, and the whole section will be marked "online".
pfn_to_online_page() will succeed, but the memmap contains garbage.

E.g., doing a "./page-types -r -a 0x144001" when QEMU was started with "-m
4160M" - (see tools/vm/page-types.c):

[  200.476376] BUG: unable to handle page fault for address: fffffffffffffffe
[  200.477500] #PF: supervisor read access in kernel mode
[  200.478334] #PF: error_code(0x0000) - not-present page
[  200.479076] PGD 59614067 P4D 59614067 PUD 59616067 PMD 0
[  200.479557] Oops: 0000 [#4] SMP NOPTI
[  200.479875] CPU: 0 PID: 603 Comm: page-types Tainted: G      D W         5.5.0-rc1-next-20191209 #93
[  200.480646] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu4
[  200.481648] RIP: 0010:stable_page_flags+0x4d/0x410
[  200.482061] Code: f3 ff 41 89 c0 48 b8 00 00 00 00 01 00 00 00 45 84 c0 0f 85 cd 02 00 00 48 8b 53 08 48 8b 2b 48f
[  200.483644] RSP: 0018:ffffb139401cbe60 EFLAGS: 00010202
[  200.484091] RAX: fffffffffffffffe RBX: fffffbeec5100040 RCX: 0000000000000000
[  200.484697] RDX: 0000000000000001 RSI: ffffffff9535c7cd RDI: 0000000000000246
[  200.485313] RBP: ffffffffffffffff R08: 0000000000000000 R09: 0000000000000000
[  200.485917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000144001
[  200.486523] R13: 00007ffd6ba55f48 R14: 00007ffd6ba55f40 R15: ffffb139401cbf08
[  200.487130] FS:  00007f68df717580(0000) GS:ffff9ec77fa00000(0000) knlGS:0000000000000000
[  200.487804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  200.488295] CR2: fffffffffffffffe CR3: 0000000135d48000 CR4: 00000000000006f0
[  200.488897] Call Trace:
[  200.489115]  kpageflags_read+0xe9/0x140
[  200.489447]  proc_reg_read+0x3c/0x60
[  200.489755]  vfs_read+0xc2/0x170
[  200.490037]  ksys_pread64+0x65/0xa0
[  200.490352]  do_syscall_64+0x5c/0xa0
[  200.490665]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

But it can be triggered much easier via "cat /proc/kpageflags > /dev/null"
after cold/hot plugging a DIMM to such a system:

[root@localhost ~]# cat /proc/kpageflags > /dev/null
[  111.517275] BUG: unable to handle page fault for address: fffffffffffffffe
[  111.517907] #PF: supervisor read access in kernel mode
[  111.518333] #PF: error_code(0x0000) - not-present page
[  111.518771] PGD a240e067 P4D a240e067 PUD a2410067 PMD 0

This patch fixes that by at least zero-ing out that memmap (so e.g.,
page_to_pfn() will not crash).  Commit 3af14a305d32 ("mm: zero remaining
unavailable struct pages") tried to fix a similar issue, but forgot to
consider this special case.

After this patch, there are still problems to solve.  E.g., not all of
these pages falling into a memory hole will actually get initialized later
and set PageReserved - they are only zeroed out - but at least the
immediate crashes are gone.  A follow-up patch will take care of this.

Link: http://lkml.kernel.org/r/20191211163201.17179-2-david@redhat.com
Fixes: 5ff3d93b7d26 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoocfs2: fix oops when writing cloned file
Gang He [Tue, 4 Feb 2020 01:33:45 +0000 (17:33 -0800)]
ocfs2: fix oops when writing cloned file

Writing a cloned file triggers a kernel oops and the user-space command
process is also killed by the system.  The bug can be reproduced stably
via:

1) create a file under ocfs2 file system directory.

  journalctl -b > aa.txt

2) create a cloned file for this file.

  reflink aa.txt bb.txt

3) write the cloned file with dd command.

  dd if=/dev/zero of=bb.txt bs=512 count=1 conv=notrunc

The dd command is killed by the kernel, then you can see the oops message
via dmesg command.

[  463.875404] BUG: kernel NULL pointer dereference, address: 0000000000000028
[  463.875413] #PF: supervisor read access in kernel mode
[  463.875416] #PF: error_code(0x0000) - not-present page
[  463.875418] PGD 0 P4D 0
[  463.875425] Oops: 0000 [#1] SMP PTI
[  463.875431] CPU: 1 PID: 2291 Comm: dd Tainted: G           OE     5.3.16-2-default
[  463.875433] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  463.875500] RIP: 0010:ocfs2_refcount_cow+0xa4/0x5d0 [ocfs2]
[  463.875505] Code: 06 89 6c 24 38 89 eb f6 44 24 3c 02 74 be 49 8b 47 28
[  463.875508] RSP: 0018:ffffa2cb409dfce8 EFLAGS: 00010202
[  463.875512] RAX: ffff8b1ebdca8000 RBX: 0000000000000001 RCX: ffff8b1eb73a9df0
[  463.875515] RDX: 0000000000056a01 RSI: 0000000000000000 RDI: 0000000000000000
[  463.875517] RBP: 0000000000000001 R08: ffff8b1eb73a9de0 R09: 0000000000000000
[  463.875520] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  463.875522] R13: ffff8b1eb922f048 R14: 0000000000000000 R15: ffff8b1eb922f048
[  463.875526] FS:  00007f8f44d15540(0000) GS:ffff8b1ebeb00000(0000) knlGS:0000000000000000
[  463.875529] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  463.875532] CR2: 0000000000000028 CR3: 000000003c17a000 CR4: 00000000000006e0
[  463.875546] Call Trace:
[  463.875596]  ? ocfs2_inode_lock_full_nested+0x18b/0x960 [ocfs2]
[  463.875648]  ocfs2_file_write_iter+0xaf8/0xc70 [ocfs2]
[  463.875672]  new_sync_write+0x12d/0x1d0
[  463.875688]  vfs_write+0xad/0x1a0
[  463.875697]  ksys_write+0xa1/0xe0
[  463.875710]  do_syscall_64+0x60/0x1f0
[  463.875743]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  463.875758] RIP: 0033:0x7f8f4482ed44
[  463.875762] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00
[  463.875765] RSP: 002b:00007fff300a79d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  463.875769] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f4482ed44
[  463.875771] RDX: 0000000000000200 RSI: 000055f771b5c000 RDI: 0000000000000001
[  463.875774] RBP: 0000000000000200 R08: 00007f8f44af9c78 R09: 0000000000000003
[  463.875776] R10: 000000000000089f R11: 0000000000000246 R12: 000055f771b5c000
[  463.875779] R13: 0000000000000200 R14: 0000000000000000 R15: 000055f771b5c000

This regression problem was introduced by commit f1b0ccaeb1b3 ("ocfs2:
protect extent tree in ocfs2_prepare_inode_for_write()").

Link: http://lkml.kernel.org/r/20200121050153.13290-1-ghe@suse.com
Fixes: f1b0ccaeb1b3 ("ocfs2: protect extent tree in ocfs2_prepare_inode_for_write()").
Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinitramfs: do not show compression mode choice if INITRAMFS_SOURCE is empty
Masahiro Yamada [Mon, 3 Feb 2020 16:47:08 +0000 (01:47 +0900)]
initramfs: do not show compression mode choice if INITRAMFS_SOURCE is empty

Since commit 4347422da395 ("initramfs: make compression options not
depend on INITRAMFS_SOURCE"), Kconfig asks the compression mode for
the built-in initramfs regardless of INITRAMFS_SOURCE.

It is technically simpler, but pointless from a UI perspective,
Linus says [1].

When INITRAMFS_SOURCE is empty, usr/Makefile creates a tiny default
cpio, which is so small that nobody cares about the compression.

This commit hides the Kconfig choice in that case. The default cpio
is embedded without compression, which was the original behavior.

[1]: https://lkml.org/lkml/2020/2/1/160

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Linus Torvalds [Mon, 3 Feb 2020 17:03:42 +0000 (17:03 +0000)]
Merge tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "Fixes that arrived after the merge window freeze, mostly stable
  material.

   - fix race in tree-mod-log element tracking

   - fix bio flushing inside extent writepages

   - fix assertion when in-memory tracking of discarded extents finds an
     empty tree (eg. after adding a new device)

   - update logic of temporary read-only block groups to take into
     account overcommit

   - fix some fixup worker corner cases:
       - page could not go through proper COW cycle and the dirty status
         is lost due to page migration
       - deadlock if delayed allocation is performed under page lock

   - fix send emitting invalid clones within the same file

   - fix statfs reporting 0 free space when global block reserve size is
     larger than remaining free space but there is still space for new
     chunks"

* tag 'for-5.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: do not zero f_bavail if we have available space
  Btrfs: send, fix emission of invalid clone operations within the same file
  btrfs: do not do delalloc reservation under page lock
  btrfs: drop the -EBUSY case in __extent_writepage_io
  Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker
  btrfs: take overcommit into account in inc_block_group_ro
  btrfs: fix force usage in inc_block_group_ro
  btrfs: Correctly handle empty trees in find_first_clear_extent_bit
  btrfs: flush write bio if we loop in extent_write_cache_pages
  Btrfs: fix race between adding and putting tree mod seq elements and nodes

4 years agoMerge tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt...
Linus Torvalds [Mon, 3 Feb 2020 16:59:51 +0000 (16:59 +0000)]
Merge tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux

Pull kgdb updates from Daniel Thompson:
 "Everything for kgdb this time around is either simplifications or
  clean ups.

  In particular Douglas Anderson's modifications to the backtrace
  machine in the *last* dev cycle have enabled Doug to tidy up some MIPS
  specific backtrace code and stop sharing certain data structures
  across the kernel. Note that The MIPS folks were on Cc: for the MIPS
  patch and reacted positively (but without an explicit Acked-by).

  Doug also got rid of the implicit switching between tasks and register
  sets during some but not of kdb's backtrace actions (because the
  implicit switching was either confusing for users, pointless or both).

  Finally there is a coverity fix and patch to replace open coded
  console traversal with the proper helper function"

* tag 'kgdb-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
  kdb: Use for_each_console() helper
  kdb: remove redundant assignment to pointer bp
  kdb: Get rid of confusing diag msg from "rd" if current task has no regs
  kdb: Gid rid of implicit setting of the current task / regs
  kdb: kdb_current_task shouldn't be exported
  kdb: kdb_current_regs should be private
  MIPS: kdb: Remove old workaround for backtracing on other CPUs

4 years agoMerge tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 3 Feb 2020 14:57:33 +0000 (14:57 +0000)]
Merge tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fix from Greg KH:
 "Here is a single patch, that fixes up a commit that came in the
  previous char/misc merge.

  It fixes a bug in the hpet driver that everyone keeps tripping over in
  their automated testing. Good thing is, people are catching it. Bad
  thing it wasn't caught by anyone testing before this. Oh well...

  This has been in linux-next for a few days with no reported issues"

* tag 'char-misc-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  char: hpet: Fix out-of-bounds read bug

4 years agoMerge tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee...
Linus Torvalds [Mon, 3 Feb 2020 14:55:08 +0000 (14:55 +0000)]
Merge tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight

Pull backlight updates from Lee Jones:
 "Fix-ups:
   - Remove superfluous code in ams369fg06
   - Convert over to GPIO descriptor (gpiod) in bd6107

  Bug Fixes:
   - Fix unsigned comparison to less than zero in qcom-wled"

* tag 'backlight-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
  backlight: qcom-wled: Fix unsigned comparison to zero
  backlight: bd6107: Convert to use GPIO descriptor
  backlight: ams369fg06: Drop GPIO include

4 years agoMerge tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Mon, 3 Feb 2020 14:51:57 +0000 (14:51 +0000)]
Merge tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd

Pull MFD updates from Lee Jones:
 "New Drivers:
   - Add support for ROHM BD71828 PMICs and GPIOs
   - Add support for Qualcomm Aqstic Audio Codecs WCD9340 and WCD9341

  New Device Support:
   - Add support for BD71828 to BD70528 RTC driver
   - Add support for Intel's Jasper Lake to LPSS PCI

  New Functionality:
   - Add support for Power Key to ROHM BD71828
   - Add support for Clocks to ROHM BD71828
   - Add support for GPIOs to Dialog DA9062
   - Add support for USB PD Notify to ChromiumOS EC
   - Allow callers to specify args when requesting regmap lookup; syscon

  Fix-ups:
   - Improve error handling and sanity checking; atmel-hlcdc, dln2
   - Device Tree support/documentation; bd71828, da9062, xylon,logicvc,
     ab8500, max14577, atmel-usart
   - Match devices using platform IDs; bd7xxxx
   - Refactor BD718x7 regulator component; bd718x7-regulator
   - Use standard interfaces/helpers; syscon, sm501
   - Trivial (whitespace, spelling, etc); ab8500-core, Kconfig
   - Remove unused code; db8500-prcmu, tqmx86
   - Wait until boot has finished before accessing registers;
     madera-core
   - Provide missing register value defaults; cs47l15-tables
   - Allow more time for hardware to reset; madera-core

  Bug Fixes:
   - Fix erroneous register values; rohm-bd70528
   - Fix register volatility; axp20x, rn5t618
   - Fix Kconfig dependencies; MFD_MAX77650
   - Fix incorrect compatible string; da9062-core
   - Fix syscon_regmap_lookup_by_phandle_args() stub; syscon"

* tag 'mfd-next-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (41 commits)
  mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy
  mfd: wcd934x: Add support to wcd9340/wcd9341 codec
  mfd: syscon: Add arguments support for syscon reference
  mfd: rn5t618: Mark ADC control register volatile
  dt-bindings: atmel-usart: Add microchip,sam9x60-{usart, dbgu}
  dt-bindings: atmel-usart: Remove wildcard
  mfd: cros_ec: Add cros-usbpd-notify subdevice
  mfd: da9062: Fix watchdog compatible string
  mfd: madera: Allow more time for hardware reset
  mfd: cs47l15: Add missing register default
  mfd: madera: Wait for boot done before accessing any other registers
  mfd: Kconfig: Rename Samsung to lowercase
  mfd: tqmx86: remove set but not used variable 'i2c_ien'
  mfd: dbx500-prcmu: Drop DSI pll clock functions
  mfd: dbx500-prcmu: Drop set_display_clocks()
  mfd: max77650: Select REGMAP_IRQ in Kconfig
  mfd: axp20x: Mark AXP20X_VBUS_IPSOUT_MGMT as volatile
  mfd: ab8500: Fix ab8500-clk typo
  mfd: intel-lpss: Add Intel Jasper Lake PCI IDs
  dt-bindings: mfd: max14577: Add reference to max14040_battery.txt descriptions
  ...

4 years agoMerge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyper...
Linus Torvalds [Mon, 3 Feb 2020 14:42:03 +0000 (14:42 +0000)]
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull Hyper-V updates from Sasha Levin:

 - Most of the commits here are work to enable host-initiated
   hibernation support by Dexuan Cui.

 - Fix for a warning shown when host sends non-aligned balloon requests
   by Tianyu Lan.

* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  hv_utils: Add the support of hibernation
  hv_utils: Support host-initiated hibernation request
  hv_utils: Support host-initiated restart request
  Tools: hv: Reopen the devices if read() or write() returns errors
  video: hyperv: hyperv_fb: Use physical memory for fb on HyperV Gen 1 VMs.
  Drivers: hv: vmbus: Ignore CHANNELMSG_TL_CONNECT_RESULT(23)
  video: hyperv_fb: Fix hibernation for the deferred IO feature
  Input: hyperv-keyboard: Add the support of hibernation
  hv_balloon: Balloon up according to request page number

4 years agomfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy
Geert Uytterhoeven [Thu, 30 Jan 2020 12:55:29 +0000 (13:55 +0100)]
mfd: syscon: Fix syscon_regmap_lookup_by_phandle_args() dummy

If CONFIG_MFD_SYSCON=n:

    include/linux/mfd/syscon.h:54:23: warning: ‘syscon_regmap_lookup_by_phandle_args’ defined but not used [-Wunused-function]

Fix this by adding the missing inline keyword.

Fixes: b4126b721e65ace4 ("mfd: syscon: Add arguments support for syscon reference")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
4 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Sun, 2 Feb 2020 19:50:58 +0000 (11:50 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fix from David Miller:
 "adjtimex regression fix from Arnd"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: fix adjtimex regression

4 years agoMerge tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux...
Linus Torvalds [Sun, 2 Feb 2020 19:48:46 +0000 (11:48 -0800)]
Merge tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds

Pull LED updates from Pavel Machek:

 - New driver for TI TPS6105X

 - Add managed API to get a LED from a device driver

 - Misc fixes and updates

* tag 'leds-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/pavel/linux-leds: (22 commits)
  leds: lm3692x: Disable chip on brightness 0
  leds: lm3692x: Split out lm3692x_leds_disable
  leds: lm3692x: Move lm3692x_init and rename to lm3692x_leds_enable
  leds: lm3692x: Make sure we don't exceed the maximum LED current
  dt: bindings: lm3692x: Add led-max-microamp property
  leds: lm3692x: Allow to configure over voltage protection
  dt: bindings: lm3692x: Add ti,ovp-microvolt property
  leds: populate the device's of_node
  leds: Add managed API to get a LED from a device driver
  leds: Add of_led_get() and led_put()
  leds: lm3532: add pointer to documentation and fix typo
  leds: lm3532: use extended registration so that LED can be used for backlight
  leds: lm3642: remove warnings for bad strtol, cleanup gotos
  leds: rb532: cleanup whitespace
  ledtrig-pattern: fix email address quoting in MODULE_AUTHOR()
  dt-bindings: mfd: update TI tps6105x chip bindings
  leds: tps6105x: add driver for MFD chip LED mode
  led: max77650: add of_match table
  leds: bd2802: Convert to use GPIO descriptors
  leds: pca963x: Fix open-drain initialization
  ...

4 years agoMerge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo...
Linus Torvalds [Sun, 2 Feb 2020 19:31:52 +0000 (11:31 -0800)]
Merge branch 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux

Pull pcmcia updates from Dominik Brodowski:
 "This is a series co-developed by Simon Geis and Lukas Panzer to clean
  up the i82092 PCMCIA device driver"

* 'pcmcia-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
  PCMCIA/i82092: remove #if 0 block
  PCMCIA/i82092: delete enter/leave macro
  PCMCIA/i82092: include <linux/io.h> instead of <asm/io.h>
  PCMCIA/i82092: shorten the lines with over 80 characters
  PCMCIA/i82092: move assignment out of if condition
  PCMCIA/i82092: change code indentation
  PCMCIA/i82092: insert blank line after declarations
  PCMCIA/i82092: remove braces around single statement blocks
  PCMCIA/i82092: add/remove spaces to improve readability
  PCMCIA/i82092: use dev_<level> instead of printk

4 years agobtrfs: do not zero f_bavail if we have available space
Josef Bacik [Fri, 31 Jan 2020 14:31:05 +0000 (09:31 -0500)]
btrfs: do not zero f_bavail if we have available space

There was some logic added a while ago to clear out f_bavail in statfs()
if we did not have enough free metadata space to satisfy our global
reserve.  This was incorrect at the time, however didn't really pose a
problem for normal file systems because we would often allocate chunks
if we got this low on free metadata space, and thus wouldn't really hit
this case unless we were actually full.

Fast forward to today and now we are much better about not allocating
metadata chunks all of the time.  Couple this with 3f36bff70c3e ("btrfs:
always reserve our entire size for the global reserve") which now means
we'll easily have a larger global reserve than our free space, we are
now more likely to trip over this while still having plenty of space.

Fix this by skipping this logic if the global rsv's space_info is not
full.  space_info->full is 0 unless we've attempted to allocate a chunk
for that space_info and that has failed.  If this happens then the space
for the global reserve is definitely sacred and we need to report
b_avail == 0, but before then we can just use our calculated b_avail.

Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Fixes: 79922986595d ("btrfs: statfs: report zero available if metadata are exhausted")
CC: stable@vger.kernel.org # 4.5+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-By: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
4 years agosparc64: fix adjtimex regression
Arnd Bergmann [Sat, 1 Feb 2020 21:20:52 +0000 (22:20 +0100)]
sparc64: fix adjtimex regression

Anatoly Pugachev reported one of the y2038 patches to introduce
a fatal bug from a stupid typo:

[   96.384129] watchdog: BUG: soft lockup - CPU#8 stuck for 22s!
...
[   96.385624]  [0000000000652ca4] handle_mm_fault+0x84/0x320
[   96.385668]  [0000000000b6f2bc] do_sparc64_fault+0x43c/0x820
[   96.385720]  [0000000000407754] sparc64_realfault_common+0x10/0x20
[   96.385769]  [000000000042fa28] __do_sys_sparc_clock_adjtime+0x28/0x80
[   96.385819]  [00000000004307f0] sys_sparc_clock_adjtime+0x10/0x20
[   96.385866]  [0000000000406294] linux_sparc_syscall+0x34/0x44

Fix the code to dereference the correct pointer again.

Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Fixes: 382977fc5fd9 ("y2038: sparc: remove use of struct timex")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4 years agoMerge tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 1 Feb 2020 19:22:41 +0000 (11:22 -0800)]
Merge tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fix from Steve French:
 "Small SMB3 fix for stable (fixes problem with soft mounts)"

* tag '5.6-rc-small-smb3-fix-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal module version number
  cifs: fix soft mounts hanging in the reconnect code

4 years agovfs: fix do_last() regression
Al Viro [Sat, 1 Feb 2020 16:26:45 +0000 (16:26 +0000)]
vfs: fix do_last() regression

Brown paperbag time: fetching ->i_uid/->i_mode really should've been
done from nd->inode.  I even suggested that, but the reason for that has
slipped through the cracks and I went for dir->d_inode instead - made
for more "obvious" patch.

Analysis:

 - at the entry into do_last() and all the way to step_into(): dir (aka
   nd->path.dentry) is known not to have been freed; so's nd->inode and
   it's equal to dir->d_inode unless we are already doomed to -ECHILD.
   inode of the file to get opened is not known.

 - after step_into(): inode of the file to get opened is known; dir
   might be pointing to freed memory/be negative/etc.

 - at the call of may_create_in_sticky(): guaranteed to be out of RCU
   mode; inode of the file to get opened is known and pinned; dir might
   be garbage.

The last was the reason for the original patch.  Except that at the
do_last() entry we can be in RCU mode and it is possible that
nd->path.dentry->d_inode has already changed under us.

In that case we are going to fail with -ECHILD, but we need to be
careful; nd->inode is pointing to valid struct inode and it's the same
as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
should use that.

Reported-by: "Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Fixes: d0ed8da6c70c ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoMerge tag 'kconfig-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Sat, 1 Feb 2020 18:25:55 +0000 (10:25 -0800)]
Merge tag 'kconfig-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kconfig updates from Masahiro Yamada:

 - add 'yes2modconfig' and 'mod2yesconfig' targets (useful mainly for
   turning syzbot configs into more modular ones as a step to minimizing
   the result)

 - sanitize help text

 - various code cleanups

* tag 'kconfig-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: fix documentation typos
  kconfig: fix an "implicit declaration of function" warning
  kconfig: fix nesting of symbol help text
  kconfig: distinguish between dependencies and visibility in help text
  kconfig: list all definitions of a symbol in help text
  kconfig: Add yes2modconfig and mod2yesconfig targets.
  kconfig: use $(PERL) in Makefile
  kconfig: fix too deep indentation in Makefile
  kconfig: localmodconfig: fix indentation for closing brace
  kconfig: localmodconfig: remove unused $config
  kconfig: squash prop_alloc() into menu_add_prop()
  kconfig: remove sym from struct property
  kconfig: remove 'prompt' argument from menu_add_prop()
  kconfig: move prompt handling to menu_add_prompt() from menu_add_prop()
  kconfig: remove 'prompt' symbol
  kconfig: drop T_WORD from the RHS of 'prompt' symbol
  kconfig: use parent->dep as the parentdep of 'menu'
  kconfig: remove the rootmenu check in menu_add_prop()

4 years agoMerge tag 'kbuild-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Sat, 1 Feb 2020 18:01:52 +0000 (10:01 -0800)]
Merge tag 'kbuild-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - detect missing include guard in UAPI headers

 - do not create orphan built-in.a or obj-y objects

 - generate modules.builtin more simply, and drop tristate.conf

 - simplify built-in initramfs creation

 - make linux-headers deb package thinner

 - optimize the deb package build script

 - misc cleanups

* tag 'kbuild-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
  builddeb: split libc headers deployment out into a function
  builddeb: split kernel headers deployment out into a function
  builddeb: remove redundant make for ARCH=um
  builddeb: avoid invoking sub-shells where possible
  builddeb: remove redundant $objtree/
  builddeb: match temporary directory name to the package name
  builddeb: remove unneeded files in hdrobjfiles for headers package
  kbuild: use -S instead of -E for precise cc-option test in Kconfig
  builddeb: allow selection of .deb compressor
  kbuild: remove 'Building modules, stage 2.' log
  kbuild: remove *.tmp file when filechk fails
  kbuild: remove PYTHON2 variable
  modpost: assume STT_SPARC_REGISTER is defined
  gen_initramfs.sh: remove intermediate cpio_list on errors
  initramfs: refactor the initramfs build rules
  gen_initramfs.sh: always output cpio even without -o option
  initramfs: add default_cpio_list, and delete -d option support
  initramfs: generate dependency list and cpio at the same time
  initramfs: specify $(src)/gen_initramfs.sh as a prerequisite in Makefile
  initramfs: make initramfs compression choice non-optional
  ...

4 years agoMerge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Sat, 1 Feb 2020 17:48:37 +0000 (09:48 -0800)]
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random changes from Ted Ts'o:
 "Change /dev/random so that it uses the CRNG and only blocking if the
  CRNG hasn't initialized, instead of the old blocking pool. Also clean
  up archrandom.h, and some other miscellaneous cleanups"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (24 commits)
  s390x: Mark archrandom.h functions __must_check
  powerpc: Mark archrandom.h functions __must_check
  powerpc: Use bool in archrandom.h
  x86: Mark archrandom.h functions __must_check
  linux/random.h: Mark CONFIG_ARCH_RANDOM functions __must_check
  linux/random.h: Use false with bool
  linux/random.h: Remove arch_has_random, arch_has_random_seed
  s390: Remove arch_has_random, arch_has_random_seed
  powerpc: Remove arch_has_random, arch_has_random_seed
  x86: Remove arch_has_random, arch_has_random_seed
  random: remove some dead code of poolinfo
  random: fix typo in add_timer_randomness()
  random: Add and use pr_fmt()
  random: convert to ENTROPY_BITS for better code readability
  random: remove unnecessary unlikely()
  random: remove kernel.random.read_wakeup_threshold
  random: delete code to pull data into pools
  random: remove the blocking pool
  random: make /dev/random be almost like /dev/urandom
  random: ignore GRND_RANDOM in getentropy(2)
  ...

4 years agoMerge tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Linus Torvalds [Fri, 31 Jan 2020 22:48:54 +0000 (14:48 -0800)]
Merge tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 "Resource management:

   - Improve resource assignment for hot-added nested bridges, e.g.,
     Thunderbolt (Nicholas Johnson)

  Power management:

   - Optionally print config space of devices before suspend (Chen Yu)

   - Increase D3 delay for AMD Ryzen5/7 XHCI controllers (Daniel Drake)

  Virtualization:

   - Generalize DMA alias quirks (James Sewart)

   - Add DMA alias quirk for PLX PEX NTB (James Sewart)

   - Fix IOV memory leak (Navid Emamdoost)

  AER:

   - Log which device prevents error recovery (Yicong Yang)

  Peer-to-peer DMA:

   - Whitelist Intel SkyLake-E (Armen Baloyan)

  Broadcom iProc host bridge driver:

   - Apply PAXC quirk whether driver is built-in or module (Wei Liu)

  Broadcom STB host bridge driver:

   - Add Broadcom STB PCIe host controller driver (Jim Quinlan)

  Intel Gateway SoC host bridge driver:

   - Add driver for Intel Gateway SoC (Dilip Kota)

  Intel VMD host bridge driver:

   - Add support for DMA aliases on other buses (Jon Derrick)

   - Remove dma_map_ops overrides (Jon Derrick)

   - Remove now-unused X86_DEV_DMA_OPS (Christoph Hellwig)

  NVIDIA Tegra host bridge driver:

   - Fix Tegra30 afi_pex2_ctrl register offset (Marcel Ziswiler)

  Panasonic UniPhier host bridge driver:

   - Remove module code since driver can't be built as a module
     (Masahiro Yamada)

  Qualcomm host bridge driver:

   - Add support for SDM845 PCIe controller (Bjorn Andersson)

  TI Keystone host bridge driver:

   - Fix "num-viewport" DT property error handling (Kishon Vijay Abraham I)

   - Fix link training retries initiation (Yurii Monakov)

   - Fix outbound region mapping (Yurii Monakov)

  Misc:

   - Add Switchtec Gen4 support (Kelvin Cao)

   - Add Switchtec Intercomm Notify and Upstream Error Containment
     support (Logan Gunthorpe)

   - Use dma_set_mask_and_coherent() since Switchtec supports 64-bit
     addressing (Wesley Sheng)"

* tag 'pci-v5.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (60 commits)
  PCI: Allow adjust_bridge_window() to shrink resource if necessary
  PCI: Set resource size directly in adjust_bridge_window()
  PCI: Rename extend_bridge_window() to adjust_bridge_window()
  PCI: Rename extend_bridge_window() parameter
  PCI: Consider alignment of hot-added bridges when assigning resources
  PCI: Remove local variable usage in pci_bus_distribute_available_resources()
  PCI: Pass size + alignment to pci_bus_distribute_available_resources()
  PCI: Rename variables
  PCI: vmd: Add two VMD Device IDs
  PCI: Remove unnecessary braces
  PCI: brcmstb: Add MSI support
  PCI: brcmstb: Add Broadcom STB PCIe host controller driver
  x86/PCI: Remove X86_DEV_DMA_OPS
  PCI: vmd: Remove dma_map_ops overrides
  iommu/vt-d: Remove VMD child device sanity check
  iommu/vt-d: Use pci_real_dma_dev() for mapping
  PCI: Introduce pci_real_dma_dev()
  x86/PCI: Expose VMD's pci_dev in struct pci_sysdata
  x86/PCI: Add to_pci_sysdata() helper
  PCI/AER: Initialize aer_fifo
  ...

4 years agoMerge tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Fri, 31 Jan 2020 22:43:23 +0000 (14:43 -0800)]
Merge tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media updates from Mauro Carvalho Chehab:

 - New staging driver for Rockship ISPv1 unit

 - New staging driver for Rockchip MIPI Synopsys DPHY RX0

 - y2038 fixes at V4L2 API (backward-compatible)

 - A dvb core fix when receiving invalid EIT sections

 - Some clang-specific warnings got fixed

 - Added support for touch V4L2 interface at vivid

 - Several drivers were converted to use the new
   i2c_new_scanned_device() kAPI

 - Added sm1 support at meson's vdec driver

 - Several other driver cleanups, fixes and improvements

* tag 'media/v5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (207 commits)
  media: staging/intel-ipu3: remove TODO item about acronyms
  media: v4l2-fwnode: Print the node name while parsing endpoints
  media: Revert "media: staging/intel-ipu3: make imgu use fixed running mode"
  media: mt9v111: constify copied structure
  media: platform: VIDEO_MEDIATEK_JPEG can also depend on MTK_IOMMU
  media: uvcvideo: Add a quirk to force GEO GC6500 Camera bits-per-pixel value
  media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
  media: hantro: fix post-processing NULL pointer dereference
  media: rcar-vin: Use correct pixel format when aligning format
  media: MAINTAINERS: add entry for Rockchip ISP1 driver
  media: staging: rkisp1: add TODO file for staging
  media: staging: rkisp1: add document for rkisp1 meta buffer format
  media: staging: rkisp1: add output device for parameters
  media: staging: rkisp1: add capture device for statistics
  media: staging: rkisp1: add user space ABI definitions
  media: staging: rkisp1: add streaming paths
  media: staging: rkisp1: add Rockchip ISP1 base driver
  media: staging: phy-rockchip-dphy-rx0: add Rockchip MIPI Synopsys DPHY RX0 driver
  media: staging: dt-bindings: add Rockchip MIPI RX D-PHY RX0 yaml bindings
  media: staging: dt-bindings: add Rockchip ISP1 yaml bindings
  ...

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Fri, 31 Jan 2020 22:40:36 +0000 (14:40 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A very quiet cycle with few notable changes. Mostly the usual list of
  one or two patches to drivers changing something that isn't quite rc
  worthy. The subsystem seems to be seeing a larger number of rework and
  cleanup style patches right now, I feel that several vendors are
  prepping their drivers for new silicon.

  Summary:

   - Driver updates and cleanup for qedr, bnxt_re, hns, siw, mlx5, mlx4,
     rxe, i40iw

   - Larger series doing cleanup and rework for hns and hfi1.

   - Some general reworking of the CM code to make it a little more
     understandable

   - Unify the different code paths connected to the uverbs FD scheme

   - New UAPI ioctls conversions for get context and get async fd

   - Trace points for CQ and CM portions of the RDMA stack

   - mlx5 driver support for virtio-net formatted rings as RDMA raw
     ethernet QPs

   - verbs support for setting the PCI-E relaxed ordering bit on DMA
     traffic connected to a MR

   - A couple of bug fixes that came too late to make rc7"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (108 commits)
  RDMA/core: Make the entire API tree static
  RDMA/efa: Mask access flags with the correct optional range
  RDMA/cma: Fix unbalanced cm_id reference count during address resolve
  RDMA/umem: Fix ib_umem_find_best_pgsz()
  IB/mlx4: Fix leak in id_map_find_del
  IB/opa_vnic: Spelling correction of 'erorr' to 'error'
  IB/hfi1: Fix logical condition in msix_request_irq
  RDMA/cm: Remove CM message structs
  RDMA/cm: Use IBA functions for complex structure members
  RDMA/cm: Use IBA functions for simple structure members
  RDMA/cm: Use IBA functions for swapping get/set acessors
  RDMA/cm: Use IBA functions for simple get/set acessors
  RDMA/cm: Add SET/GET implementations to hide IBA wire format
  RDMA/cm: Add accessors for CM_REQ transport_type
  IB/mlx5: Return the administrative GUID if exists
  RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence
  IB/mlx4: Fix memory leak in add_gid error flow
  IB/mlx5: Expose RoCE accelerator counters
  RDMA/mlx5: Set relaxed ordering when requested
  RDMA/core: Add the core support field to METHOD_GET_CONTEXT
  ...

4 years agoMerge tag 'thermal-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/therm...
Linus Torvalds [Fri, 31 Jan 2020 22:39:21 +0000 (14:39 -0800)]
Merge tag 'thermal-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal fixes from Daniel Lezcano:

 - Fix a severe docs build failure for cpu idle cooling device (Randy
   Dunlap)

 - Fix a spelling mistake in the error message for the stm32 (Colin Ian
   King)

* tag 'thermal-v5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal: stm32: fix spelling mistake "preprare" -> "prepare"
  Documentation: cpu-idle-cooling: fix a SEVERE docs build failure

4 years agoMerge tag 'acpi-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 31 Jan 2020 22:38:17 +0000 (14:38 -0800)]
Merge tag 'acpi-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "Fix up MAINTAINERS entires related to ACPI (Andy Shevchenko)"

* tag 'acpi-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  MAINTAINERS: Sort entries in database for X-POWERS AXP288
  MAINTAINERS: Sort entries in database for ACPICA
  MAINTAINERS: Sort entries in database for ACPI

4 years agoMerge tag 'pm-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 31 Jan 2020 22:36:35 +0000 (14:36 -0800)]
Merge tag 'pm-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power manadement updates from Rafael Wysocki:
 "Prevent cpufreq from creating excessively large stack frames and fix
  the handling of devices deleted during system-wide resume in the PM
  core (Rafael Wysocki), revert a problematic commit affecting the
  cpupower utility and correct its man page (Thomas Renninger,
  Brahadambal Srinivasan), and improve the intel_pstate_tracer utility
  (Doug Smythies)"

* tag 'pm-5.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  tools/power/x86/intel_pstate_tracer: change several graphs to autoscale y-axis
  tools/power/x86/intel_pstate_tracer: changes for python 3 compatibility
  Correction to manpage of cpupower
  cpufreq: Avoid creating excessively large stack frames
  PM: core: Fix handling of devices deleted during system-wide resume
  cpupower: Revert library ABI changes from commit 171b2b093773eec926566

4 years agocifs: update internal module version number
Steve French [Fri, 31 Jan 2020 21:13:22 +0000 (15:13 -0600)]
cifs: update internal module version number

To 2.25

Signed-off-by: Steve French <stfrench@microsoft.com>
4 years agoMerge tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux...
Linus Torvalds [Fri, 31 Jan 2020 21:07:16 +0000 (13:07 -0800)]
Merge tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Andreas Gruenbacher:

 - Fix some corner cases on filesystems with a block size < page size.

 - Fix a corner case that could expose incorrect access times over nfs.

 - Revert an otherwise sensible revoke accounting cleanup that causes
   assertion failures. The revoke accounting is whacky and needs to be
   fixed properly before we can add back this cleanup.

 - Various other minor cleanups.

In addition, please expect to see another pull request from Bob Peterson
about his gfs2 recovery patch queue shortly.

* tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  Revert "gfs2: eliminate tr_num_revoke_rm"
  gfs2: remove unused LBIT macros
  fs/gfs2: remove unused IS_DINODE and IS_LEAF macros
  gfs2: Remove GFS2_MIN_LVB_SIZE define
  gfs2: Fix incorrect variable name
  gfs2: Avoid access time thrashing in gfs2_inode_lookup
  gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepage
  gfs2: eliminate ssize parameter from gfs2_struct2blk
  gfs2: Another gfs2_find_jhead fix

4 years agoMerge tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Fri, 31 Jan 2020 20:58:12 +0000 (12:58 -0800)]
Merge tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap fix from Darrick Wong:
 "A single patch fixing an off-by-one error when we're checking to see
  how far we're gotten into an EOF page"

* tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  fs: Fix page_mkwrite off-by-one errors

4 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Fri, 31 Jan 2020 20:16:36 +0000 (12:16 -0800)]
Merge branch 'akpm' (patches from Andrew)

Pull updates from Andrew Morton:
 "Most of -mm and quite a number of other subsystems: hotfixes, scripts,
  ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov.

  MM is fairly quiet this time.  Holidays, I assume"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
  kcov: ignore fault-inject and stacktrace
  include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
  execve: warn if process starts with executable stack
  reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
  init/main.c: fix misleading "This architecture does not have kernel memory protection" message
  init/main.c: fix quoted value handling in unknown_bootoption
  init/main.c: remove unnecessary repair_env_string in do_initcall_level
  init/main.c: log arguments and environment passed to init
  fs/binfmt_elf.c: coredump: allow process with empty address space to coredump
  fs/binfmt_elf.c: coredump: delete duplicated overflow check
  fs/binfmt_elf.c: coredump: allocate core ELF header on stack
  fs/binfmt_elf.c: make BAD_ADDR() unlikely
  fs/binfmt_elf.c: better codegen around current->mm
  fs/binfmt_elf.c: don't copy ELF header around
  fs/binfmt_elf.c: fix ->start_code calculation
  fs/binfmt_elf.c: smaller code generation around auxv vector fill
  lib/find_bit.c: uninline helper _find_next_bit()
  lib/find_bit.c: join _find_next_bit{_le}
  uapi: rename ext2_swab() to swab() and share globally in swab.h
  lib/scatterlist.c: adjust indentation in __sg_alloc_table
  ...

4 years agoMerge tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu...
Linus Torvalds [Fri, 31 Jan 2020 19:42:13 +0000 (11:42 -0800)]
Merge tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux

Pull module updates from Jessica Yu:
 "Summary of modules changes for the 5.6 merge window:

   - Add "MS" (SHF_MERGE|SHF_STRINGS) section flags to __ksymtab_strings
     to indicate to the linker that it can perform string deduplication
     (i.e., duplicate strings are reduced to a single copy in the string
     table). This means any repeated namespace string would be merged to
     just one entry in __ksymtab_strings.

   - Various code cleanups and small fixes (fix small memleak in error
     path, improve moduleparam docs, silence rcu warnings, improve error
     logging)"

* tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module.h: Annotate mod_kallsyms with __rcu
  module: avoid setting info->name early in case we can fall back to info->mod->name
  modsign: print module name along with error message
  kernel/module: Fix memleak in module_add_modinfo_attrs()
  export.h: reduce __ksymtab_strings string duplication by using "MS" section flags
  moduleparam: fix kerneldoc
  modules: lockdep: Suppress suspicious RCU usage warning

4 years agoMerge tag 'mips_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Linus Torvalds [Fri, 31 Jan 2020 19:28:31 +0000 (11:28 -0800)]
Merge tag 'mips_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS changes from Paul Burton:
 "Nothing too big or scary in here:

   - Support mremap() for the VDSO, primarily to allow CRIU to restore
     the VDSO to its checkpointed location.

   - Restore the MIPS32 cBPF JIT, after having reverted the enablement
     of the eBPF JIT for MIPS32 systems in the 5.5 cycle.

   - Improve cop0 counter synchronization behaviour whilst onlining CPUs
     by running with interrupts disabled.

   - Better match FPU behaviour when emulating multiply-accumulate
     instructions on pre-r6 systems that implement IEEE754-2008 style
     MACs.

   - Loongson64 kernels now build using the MIPS64r2 ISA, allowing them
     to take advantage of instructions introduced by r2.

   - Support for the Ingenic X1000 SoC & the really nice little CU Neo
     development board that's using it.

   - Support for WMAC on GARDENA Smart Gateway devices.

   - Lots of cleanup & refactoring of SGI IP27 (Origin 2*) support in
     preparation for introducing IP35 (Origin 3*) support.

   - Various Kconfig & Makefile cleanups"

* tag 'mips_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (60 commits)
  MIPS: PCI: Add detection of IOC3 on IO7, IO8, IO9 and Fuel
  MIPS: Loongson64: Disable exec hazard
  MIPS: Loongson64: Bump ISA level to MIPSR2
  MIPS: Make DIEI support as a config option
  MIPS: OCTEON: octeon-irq: fix spelling mistake "to" -> "too"
  MIPS: asm: local: add barriers for Loongson
  MIPS: Loongson64: Select mac2008 only feature
  MIPS: Add MAC2008 Support
  Revert "MIPS: Add custom serial.h with BASE_BAUD override for generic kernel"
  MIPS: sort MIPS and MIPS_GENERIC Kconfig selects alphabetically (again)
  MIPS: make CPU_HAS_LOAD_STORE_LR opt-out
  MIPS: generic: don't unconditionally select PINCTRL
  MIPS: don't explicitly select LIBFDT in Kconfig
  MIPS: sync-r4k: do slave counter synchronization with disabled HW interrupts
  MIPS: SGI-IP30: Check for valid pointer before using it
  MIPS: syscalls: fix indentation of the 'SYSNR' message
  MIPS: boot: fix typo in 'vmlinux.lzma.its' target
  MIPS: fix indentation of the 'RELOCS' message
  dt-bindings: Document loongson vendor-prefix
  MIPS: CU1000-Neo: Refresh defconfig to support HWMON and WiFi.
  ...

4 years agoMerge tag 'arc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Fri, 31 Jan 2020 19:26:11 +0000 (11:26 -0800)]
Merge tag 'arc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:

 - Wire up clone3 syscall

 - ARCv2 FPU state save/restore across context switch

 - AXS10x platform and misc fixes

* tag 'arc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARCv2: fpu: preserve userspace fpu state
  ARC: fpu: declutter code, move bits out into fpu.h
  ARC: wireup clone3 syscall
  ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node
  ARC: update feature support for jump-labels

4 years agoMerge tag 'riscv-for-linus-5.6-mw0' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Jan 2020 19:23:29 +0000 (11:23 -0800)]
Merge tag 'riscv-for-linus-5.6-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains a handful of patches for this merge window:

   - Support for kasan

   - 32-bit physical addresses on rv32i-based systems

   - Support for CONFIG_DEBUG_VIRTUAL

   - DT entry for the FU540 GPIO controller, which has recently had a
     device driver merged

  These boot a buildroot-based system on QEMU's virt board for me"

* tag 'riscv-for-linus-5.6-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: dts: Add DT support for SiFive FU540 GPIO driver
  riscv: mm: add support for CONFIG_DEBUG_VIRTUAL
  riscv: keep 32-bit kernel to 32-bit phys_addr_t
  kasan: Add riscv to KASAN documentation.
  riscv: Add KASAN support
  kasan: No KASAN's memmove check if archs don't have it.

4 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 31 Jan 2020 19:05:33 +0000 (11:05 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - three fixes and a cleanup for the resctrl code

   - a HyperV fix

   - a fix to /proc/kcore contents in live debugging sessions

   - a fix for the x86 decoder opcode map"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/decoder: Add TEST opcode to Group3-2
  x86/resctrl: Clean up unused function parameter in mkdir path
  x86/resctrl: Fix a deadlock due to inaccurate reference
  x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
  x86/resctrl: Fix use-after-free when deleting resource groups
  x86/hyper-v: Add "polling" bit to hv_synic_sint
  x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y

4 years agokcov: ignore fault-inject and stacktrace
Dmitry Vyukov [Fri, 31 Jan 2020 06:17:35 +0000 (22:17 -0800)]
kcov: ignore fault-inject and stacktrace

Don't instrument 3 more files that contain debugging facilities and
produce large amounts of uninteresting coverage for every syscall.

The following snippets are sprinkled all over the place in kcov traces
in a debugging kernel.  We already try to disable instrumentation of
stack unwinding code and of most debug facilities.  I guess we did not
use fault-inject.c at the time, and stacktrace.c was somehow missed (or
something has changed in kernel/configs).  This change both speeds up
kcov (kernel doesn't need to store these PCs, user-space doesn't need to
process them) and frees trace buffer capacity for more useful coverage.

  should_fail
  lib/fault-inject.c:149
  fail_dump
  lib/fault-inject.c:45

  stack_trace_save
  kernel/stacktrace.c:124
  stack_trace_consume_entry
  kernel/stacktrace.c:86
  stack_trace_consume_entry
  kernel/stacktrace.c:89
  ... a hundred frames skipped ...
  stack_trace_consume_entry
  kernel/stacktrace.c:93
  stack_trace_consume_entry
  kernel/stacktrace.c:86

Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinclude/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()
Andy Shevchenko [Fri, 31 Jan 2020 06:17:32 +0000 (22:17 -0800)]
include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc()

Use PHYS_PFN() macro in io_mapping_map_atomic_wc() instead of open coded
variant.

Link: http://lkml.kernel.org/r/20191209165624.56351-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoexecve: warn if process starts with executable stack
Alexey Dobriyan [Fri, 31 Jan 2020 06:17:29 +0000 (22:17 -0800)]
execve: warn if process starts with executable stack

There were few episodes of silent downgrade to an executable stack over
years:

1) linking innocent looking assembly file will silently add executable
   stack if proper linker options is not given as well:

$ cat f.S
.intel_syntax noprefix
.text
.globl f
f:
        ret

$ cat main.c
void f(void);
int main(void)
{
        f();
        return 0;
}

$ gcc main.c f.S
$ readelf -l ./a.out
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                         0x0000000000000000 0x0000000000000000  RWE    0x10
   ^^^

2) converting C99 nested function into a closure
   https://nullprogram.com/blog/2019/11/15/

void intsort2(int *base, size_t nmemb, _Bool invert)
{
    int cmp(const void *a, const void *b)
    {
        int r = *(int *)a - *(int *)b;
        return invert ? -r : r;
    }
    qsort(base, nmemb, sizeof(*base), cmp);
}

will silently require stack trampolines while non-closure version will
not.

Without doubt this behaviour is documented somewhere, add a warning so
that developers and users can at least notice.  After so many years of
x86_64 having proper executable stack support it should not cause too
many problems.

Link: http://lkml.kernel.org/r/20191208171918.GC19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Will Deacon <will@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoreiserfs: prevent NULL pointer dereference in reiserfs_insert_item()
Yunfeng Ye [Fri, 31 Jan 2020 06:17:26 +0000 (22:17 -0800)]
reiserfs: prevent NULL pointer dereference in reiserfs_insert_item()

The variable inode may be NULL in reiserfs_insert_item(), but there is
no check before accessing the member of inode.

Fix this by adding NULL pointer check before calling reiserfs_debug().

Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.com
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Cc: zhengbin <zhengbin13@huawei.com>
Cc: Hu Shiyuan <hushiyuan@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinit/main.c: fix misleading "This architecture does not have kernel memory protection...
Christophe Leroy [Fri, 31 Jan 2020 06:17:23 +0000 (22:17 -0800)]
init/main.c: fix misleading "This architecture does not have kernel memory protection" message

This message leads to thinking that memory protection is not implemented
for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX
only means that memory protection has not been selected at compile time.

Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is
selected by the architecture.  Instead, print "Kernel memory protection
not selected by kernel config."

Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.fr
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinit/main.c: fix quoted value handling in unknown_bootoption
Arvind Sankar [Fri, 31 Jan 2020 06:17:19 +0000 (22:17 -0800)]
init/main.c: fix quoted value handling in unknown_bootoption

Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2.

unknown_bootoption passes unrecognized command line arguments to init as
either environment variables or arguments.  Some of the logic in the
function is broken for quoted command line arguments.

When an argument of the form param="value" is processed by parse_args
and passed to unknown_bootoption, the command line has

  param\0"value\0

with val pointing to the beginning of value.  The helper function
repair_env_string is then used to restore the '=' character that was
removed by parse_args, and strip the quotes off fully.  This results in

  param=value\0\0

and val ends up pointing to the 'a' instead of the 'v' in value.  This
bug was introduced when repair_env_string was refactored into a separate
function, and the decrement of val in repair_env_string became dead
code.

This causes two problems in unknown_bootoption in the two places where
the val pointer is used as a substitute for the length of param:

1. An argument of the form param=".value" is misinterpreted as a
   potential module parameter, with the result that it will not be
   placed in init's environment.

2. An argument of the form param="value" is checked to see if param is
   an existing environment variable that should be overwritten, but the
   comparison is off-by-one and compares 'param=v' instead of 'param='
   against the existing environment. So passing, for example,
   TERM="vt100" on the command line results in init being passed both
   TERM=linux and TERM=vt100 in its environment.

Patch 1 adds logging for the arguments and environment passed to init
and is independent of the rest: it can be dropped if this is
unnecessarily verbose.

Patch 2 removes repair_env_string from initcall parameter parsing in
do_initcall_level, as that uses a separate copy of the command line now
and the repairing is no longer necessary.

Patch 3 fixes the bug in unknown_bootoption by recording the length of
param explicitly instead of implying it from val-param.

This patch (of 3):

Commit c962c618b93f ("init: fix bug where environment vars can't be
passed via boot args") introduced two minor bugs in unknown_bootoption
by factoring out the quoted value handling into a separate function.

When value is quoted, repair_env_string will move the value up 1 byte to
strip the quotes, so val in unknown_bootoption no longer points to the
actual location of the value.

The result is that an argument of the form param=".value" is mistakenly
treated as a potential module parameter and is not placed in init's
environment, and an argument of the form param="value" can result in a
duplicate environment variable: eg TERM="vt100" on the command line will
result in both TERM=linux and TERM=vt100 being placed into init's
environment.

Fix this by recording the length of the param before calling
repair_env_string instead of relying on val.

Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinit/main.c: remove unnecessary repair_env_string in do_initcall_level
Arvind Sankar [Fri, 31 Jan 2020 06:17:16 +0000 (22:17 -0800)]
init/main.c: remove unnecessary repair_env_string in do_initcall_level

Since commit 182c2b3bab7c ("init: fix in-place parameter modification
regression"), parse_args in do_initcall_level is called on a copy of
saved_command_line.  It is unnecessary to call repair_env_string during
this parsing, as this copy is not used for anything later.

Remove the now unnecessary arguments from repair_env_string as well.

Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Krzysztof Mazur <krzysiek@podlesie.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinit/main.c: log arguments and environment passed to init
Arvind Sankar [Fri, 31 Jan 2020 06:17:13 +0000 (22:17 -0800)]
init/main.c: log arguments and environment passed to init

Extend logging in `run_init_process` to also show the arguments and
environment that we are passing to init.

Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.edu
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: coredump: allow process with empty address space to coredump
Alexey Dobriyan [Fri, 31 Jan 2020 06:17:10 +0000 (22:17 -0800)]
fs/binfmt_elf.c: coredump: allow process with empty address space to coredump

Unmapping whole address space at once with

munmap(0, (1ULL<<47) - 4096)

or equivalent will create empty coredump.

It is silly way to exit, however registers content may still be useful.

The right to coredump is fundamental right of a process!

Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: coredump: delete duplicated overflow check
Alexey Dobriyan [Fri, 31 Jan 2020 06:17:07 +0000 (22:17 -0800)]
fs/binfmt_elf.c: coredump: delete duplicated overflow check

array_size() macro will do overflow check anyway.

Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: coredump: allocate core ELF header on stack
Alexey Dobriyan [Fri, 31 Jan 2020 06:17:04 +0000 (22:17 -0800)]
fs/binfmt_elf.c: coredump: allocate core ELF header on stack

Comment says ELF header is "too large to be on stack".  64 bytes on
64-bit is not large by any means.

Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: make BAD_ADDR() unlikely
Alexey Dobriyan [Fri, 31 Jan 2020 06:17:01 +0000 (22:17 -0800)]
fs/binfmt_elf.c: make BAD_ADDR() unlikely

If some mapping goes past TASK_SIZE it will be rejected by kernel which
means no such userspace binaries exist.

Mark every such check as unlikely.

Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: better codegen around current->mm
Alexey Dobriyan [Fri, 31 Jan 2020 06:16:58 +0000 (22:16 -0800)]
fs/binfmt_elf.c: better codegen around current->mm

"current->mm" pointer is stable in general except few cases one of which
execve(2).  Compiler can't treat is as stable but it _is_ stable most of
the time.  During ELF loading process ->mm becomes stable right after
flush_old_exec().

Help compiler by caching current->mm, otherwise it continues to refetch
it.

add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-141 (-141)
Function                                     old     new   delta
elf_core_dump                               5062    5039     -23
load_elf_binary                             5426    5308    -118

Note: other cases are left as is because it is either pessimisation or
no change in binary size.

Link: http://lkml.kernel.org/r/20191215124755.GB21124@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: don't copy ELF header around
Alexey Dobriyan [Fri, 31 Jan 2020 06:16:55 +0000 (22:16 -0800)]
fs/binfmt_elf.c: don't copy ELF header around

ELF header is read into bprm->buf[] by generic execve code.

Save a memcpy and allocate just one header for the interpreter instead
of two headers (64 bytes instead of 128 on 64-bit).

Link: http://lkml.kernel.org/r/20191208171242.GA19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: fix ->start_code calculation
Alexey Dobriyan [Fri, 31 Jan 2020 06:16:52 +0000 (22:16 -0800)]
fs/binfmt_elf.c: fix ->start_code calculation

Only executable segments should be accounted to ->start_code just like
they do to ->end_code (correctly).

Link: http://lkml.kernel.org/r/20191208171410.GB19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agofs/binfmt_elf.c: smaller code generation around auxv vector fill
Alexey Dobriyan [Fri, 31 Jan 2020 06:16:50 +0000 (22:16 -0800)]
fs/binfmt_elf.c: smaller code generation around auxv vector fill

Filling auxv vector as array with index (auxv[i++] = ...) generates
terrible code.  "saved_auxv" should be reworked because it is the worst
member of mm_struct by size/usefullness ratio but do it later.

Meanwhile help gcc a little with *auxv++ idiom.

Space savings on x86_64:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-127 (-127)
Function                                     old     new   delta
load_elf_binary                             5470    5343    -127

Link: http://lkml.kernel.org/r/20191208172301.GD19716@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/find_bit.c: uninline helper _find_next_bit()
Yury Norov [Fri, 31 Jan 2020 06:16:47 +0000 (22:16 -0800)]
lib/find_bit.c: uninline helper _find_next_bit()

It saves 25% of .text for arm64, and more for BE architectures.

Before:
  $ size lib/find_bit.o
     text    data     bss     dec     hex filename
     1012      56       0    1068     42c lib/find_bit.o

After:
  $ size lib/find_bit.o
     text    data     bss     dec     hex filename
      776      56       0     832     340 lib/find_bit.o

Link: http://lkml.kernel.org/r/20200103202846.21616-3-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Allison Randal <allison@lohutok.net>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/find_bit.c: join _find_next_bit{_le}
Yury Norov [Fri, 31 Jan 2020 06:16:43 +0000 (22:16 -0800)]
lib/find_bit.c: join _find_next_bit{_le}

_find_next_bit and _find_next_bit_le are very similar functions.  It's
possible to join them by adding 1 parameter and a couple of simple
checks.  It's simplify maintenance and make possible to shrink the size
of .text by un-inlining the unified function (in the following patch).

Link: http://lkml.kernel.org/r/20200103202846.21616-2-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agouapi: rename ext2_swab() to swab() and share globally in swab.h
Yury Norov [Fri, 31 Jan 2020 06:16:40 +0000 (22:16 -0800)]
uapi: rename ext2_swab() to swab() and share globally in swab.h

ext2_swab() is defined locally in lib/find_bit.c However it is not
specific to ext2, neither to bitmaps.

There are many potential users of it, so rename it to just swab() and
move to include/uapi/linux/swab.h

ABI guarantees that size of unsigned long corresponds to BITS_PER_LONG,
therefore drop unneeded cast.

Link: http://lkml.kernel.org/r/20200103202846.21616-1-yury.norov@gmail.com
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/scatterlist.c: adjust indentation in __sg_alloc_table
Nathan Chancellor [Fri, 31 Jan 2020 06:16:37 +0000 (22:16 -0800)]
lib/scatterlist.c: adjust indentation in __sg_alloc_table

Clang warns:

  ../lib/scatterlist.c:314:5: warning: misleading indentation; statement
  is not part of the previous 'if' [-Wmisleading-indentation]
                          return -ENOMEM;
                          ^
  ../lib/scatterlist.c:311:4: note: previous statement is here
                          if (prv)
                          ^
  1 warning generated.

This warning occurs because there is a space before the tab on this
line.  Remove it so that the indentation is consistent with the Linux
kernel coding style and clang no longer warns.

Link: http://lkml.kernel.org/r/20191218033606.11942-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/830
Fixes: 08e17246d1b3 ("scatterlist: prevent invalid free when alloc fails")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agobtrfs: use larger zlib buffer for s390 hardware compression
Mikhail Zaslonko [Fri, 31 Jan 2020 06:16:33 +0000 (22:16 -0800)]
btrfs: use larger zlib buffer for s390 hardware compression

In order to benefit from s390 zlib hardware compression support,
increase the btrfs zlib workspace buffer size from 1 to 4 pages (if s390
zlib hardware support is enabled on the machine).

This brings up to 60% better performance in hardware on s390 compared to
the PAGE_SIZE buffer and much more compared to the software zlib
processing in btrfs.  In case of memory pressure, fall back to a single
page buffer during workspace allocation.

The data compressed with larger input buffers will still conform to zlib
standard and thus can be decompressed also on a systems that uses only
PAGE_SIZE buffer for btrfs zlib.

Link: http://lkml.kernel.org/r/20200108105103.29028-1-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/zlib: add zlib_deflate_dfltcc_enabled() function
Mikhail Zaslonko [Fri, 31 Jan 2020 06:16:30 +0000 (22:16 -0800)]
lib/zlib: add zlib_deflate_dfltcc_enabled() function

Add a new function to zlib.h checking if s390 Deflate-Conversion
facility is installed and enabled.

Link: http://lkml.kernel.org/r/20200103223334.20669-6-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agos390/boot: add dfltcc= kernel command line parameter
Mikhail Zaslonko [Fri, 31 Jan 2020 06:16:27 +0000 (22:16 -0800)]
s390/boot: add dfltcc= kernel command line parameter

Add the new kernel command line parameter 'dfltcc=' to configure s390
zlib hardware support.

Format: { on | off | def_only | inf_only | always }
 on:       s390 zlib hardware support for compression on
           level 1 and decompression (default)
 off:      No s390 zlib hardware support
 def_only: s390 zlib hardware support for deflate
           only (compression on level 1)
 inf_only: s390 zlib hardware support for inflate
           only (decompression)
 always:   Same as 'on' but ignores the selected compression
           level always using hardware support (used for debugging)

Link: http://lkml.kernel.org/r/20200103223334.20669-5-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/zlib: add s390 hardware support for kernel zlib_inflate
Mikhail Zaslonko [Fri, 31 Jan 2020 06:16:23 +0000 (22:16 -0800)]
lib/zlib: add s390 hardware support for kernel zlib_inflate

Add decompression functions to zlib_dfltcc library.  Update zlib_inflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware inflate
decompression.

Link: http://lkml.kernel.org/r/20200103223334.20669-4-zaslonko@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agos390/boot: rename HEAP_SIZE due to name collision
Mikhail Zaslonko [Fri, 31 Jan 2020 06:16:20 +0000 (22:16 -0800)]
s390/boot: rename HEAP_SIZE due to name collision

Change the conflicting macro name in preparation for zlib_inflate
hardware support.

Link: http://lkml.kernel.org/r/20200103223334.20669-3-zaslonko@linux.ibm.com
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agolib/zlib: add s390 hardware support for kernel zlib_deflate
Mikhail Zaslonko [Fri, 31 Jan 2020 06:16:17 +0000 (22:16 -0800)]
lib/zlib: add s390 hardware support for kernel zlib_deflate

Patch series "S390 hardware support for kernel zlib", v3.

With IBM z15 mainframe the new DFLTCC instruction is available.  It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patchset adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:

https://github.com/madler/zlib/pull/410

The coding style is also preserved for future maintainability.  There is
only limited set of userspace zlib functions represented in kernel.
Apart from that, all the memory allocation should be performed in
advance.  Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.

Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).  Like
it was implemented for userspace, kernel zlib will compress in hardware
on level 1, and in software on all other levels.  Decompression will
always happen in hardware (when enabled).

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page.  Therefore care should be taken when using hardware compression
when reproducible results are desired.  However it does always produce
the standard conform output which can be inflated anyway.

The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:

    Format: { on | off | def_only | inf_only | always }
     on:       s390 zlib hardware support for compression on
               level 1 and decompression (default)
     off:      No s390 zlib hardware support
     def_only: s390 zlib hardware support for deflate
               only (compression on level 1)
     inf_only: s390 zlib hardware support for inflate
               only (decompression)
     always:   Same as 'on' but ignores the selected compression
               level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel
zlib is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled.  Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch
6) the following performance results have been achieved using the
ramdisk with btrfs.  These are relative numbers based on throughput rate
and compression ratio for zlib level 1:

  Input data              Deflate rate   Inflate rate   Compression ratio
                          NXU/Software   NXU/Software   NXU/Software
  stream of zeroes        1.46           1.02           1.00
  random ASCII data       10.44          3.00           0.96
  ASCII text (dickens)    6,21           3.33           0.94
  binary data (vmlinux)   8,37           3.90           1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers
to all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in
native LPAR on a z15 system.  Results may vary based on individual
workload, configuration and software levels.

This patch (of 9):

Create zlib_dfltcc library with the s390 DEFLATE CONVERSION CALL
implementation and related compression functions.  Update zlib_deflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware deflate.

Link: http://lkml.kernel.org/r/20200103223334.20669-2-zaslonko@linux.ibm.com
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Co-developed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoiio: adc: qcom-vadc-common: use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:16:13 +0000 (22:16 -0800)]
iio: adc: qcom-vadc-common: use <linux/units.h> helpers

This switches the qcom-vadc-common to use milli_kelvin_to_millicelsius()
in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-13-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agothermal: armada: remove unused TO_MCELSIUS macro
Akinobu Mita [Fri, 31 Jan 2020 06:16:09 +0000 (22:16 -0800)]
thermal: armada: remove unused TO_MCELSIUS macro

This removes unused TO_MCELSIUS() macro.

Link: http://lkml.kernel.org/r/1576386975-7941-12-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoiwlwifi: use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:16:05 +0000 (22:16 -0800)]
iwlwifi: use <linux/units.h> helpers

This switches the iwlwifi driver to use celsius_to_kelvin() and
kelvin_to_celsius() in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-11-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Luca Coelho <luciano.coelho@intel.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoiwlegacy: use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:16:01 +0000 (22:16 -0800)]
iwlegacy: use <linux/units.h> helpers

This switches the iwlegacy driver to use celsius_to_kelvin() and
kelvin_to_celsius() in <linux/units.h>.

[akinobu.mita@gmail.com: fix build warnings with format string]
Link: http://lkml.kernel.org/r/1579014483-9226-1-git-send-email-akinobu.mita@gmail.com
Link: https://lore.kernel.org/r/20200106171452.201c3b4c@canb.auug.org.au
Link: http://lkml.kernel.org/r/1576386975-7941-10-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agothermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h>
Akinobu Mita [Fri, 31 Jan 2020 06:15:57 +0000 (22:15 -0800)]
thermal: remove kelvin to/from Celsius conversion helpers from <linux/thermal.h>

This removes the kelvin to/from Celsius conversion helper macros in
<linux/thermal.h> which were switched to the inline helper functions in
<linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-9-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agonvme: hwmon: switch to use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:15:53 +0000 (22:15 -0800)]
nvme: hwmon: switch to use <linux/units.h> helpers

This switches the nvme driver to use kelvin_to_millicelsius() and
millicelsius_to_kelvin() in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-8-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agothermal: intel_pch: switch to use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:15:48 +0000 (22:15 -0800)]
thermal: intel_pch: switch to use <linux/units.h> helpers

This switches the intel pch thermal driver to use
deci_kelvin_to_millicelsius() in <linux/units.h> instead of helpers in
<linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius
conversion helpers in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-7-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agothermal: int340x: switch to use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:15:44 +0000 (22:15 -0800)]
thermal: int340x: switch to use <linux/units.h> helpers

This switches the int340x thermal zone driver to use
deci_kelvin_to_millicelsius() and millicelsius_to_deci_kelvin() in
<linux/units.h> instead of helpers in <linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius
conversion helpers in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-6-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoplatform/x86: intel_menlow: switch to use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:15:40 +0000 (22:15 -0800)]
platform/x86: intel_menlow: switch to use <linux/units.h> helpers

This switches the intel_menlow driver to use deci_kelvin_to_celsius()
and celsius_to_deci_kelvin() in <linux/units.h> instead of helpers in
<linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius
conversion helpers in <linux/units.h>.

This also removes a trailing space, while we're at it.

Link: http://lkml.kernel.org/r/1576386975-7941-5-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoplatform/x86: asus-wmi: switch to use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:15:37 +0000 (22:15 -0800)]
platform/x86: asus-wmi: switch to use <linux/units.h> helpers

The asus-wmi driver doesn't implement the thermal device functionality
directly, so including <linux/thermal.h> just for
DECI_KELVIN_TO_CELSIUS() is a bit odd.

This switches the asus-wmi driver to use deci_kelvin_to_millicelsius()
in <linux/units.h>.

The format string is changed from %d to %ld due to function returned
type.

Link: http://lkml.kernel.org/r/1576386975-7941-4-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoACPI: thermal: switch to use <linux/units.h> helpers
Akinobu Mita [Fri, 31 Jan 2020 06:15:33 +0000 (22:15 -0800)]
ACPI: thermal: switch to use <linux/units.h> helpers

This switches the ACPI thermal zone driver to use
celsius_to_deci_kelvin(), deci_kelvin_to_celsius(), and
deci_kelvin_to_millicelsius_with_offset() in <linux/units.h> instead of
helpers in <linux/thermal.h>.

This is preparation for centralizing the kelvin to/from Celsius
conversion helpers in <linux/units.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-3-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agoinclude/linux/units.h: add helpers for kelvin to/from Celsius conversion
Akinobu Mita [Fri, 31 Jan 2020 06:15:28 +0000 (22:15 -0800)]
include/linux/units.h: add helpers for kelvin to/from Celsius conversion

Patch series "add header file for kelvin to/from Celsius conversion
helpers", v4.

There are several helper macros to convert kelvin to/from Celsius in
<linux/thermal.h> for thermal drivers.  These are useful for any other
drivers or subsystems, but it's odd to include <linux/thermal.h> just
for the helpers.

This adds a new <linux/units.h> that provides the equivalent inline
functions for any drivers or subsystems, and switches all the users of
conversion helpers in <linux/thermal.h> to use <linux/units.h> helpers.

This patch (of 12):

There are several helper macros to convert kelvin to/from Celsius in
<linux/thermal.h> for thermal drivers.  These are useful for any other
drivers or subsystems, but it's odd to include <linux/thermal.h> just
for the helpers.

This adds a new <linux/units.h> that provides the equivalent inline
functions for any drivers or subsystems.  It is intended to replace the
helpers in <linux/thermal.h>.

Link: http://lkml.kernel.org/r/1576386975-7941-2-git-send-email-akinobu.mita@gmail.com
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Sujith Thomas <sujith.thomas@intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amit.kucheria@verdurent.com>
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Kalle Valo <kvalo@codeaurora.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Cc: Luca Coelho <luciano.coelho@intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agodrivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback...
Colin Ian King [Fri, 31 Jan 2020 06:15:25 +0000 (22:15 -0800)]
drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store

Currently when an error code -EIO or -ENOSPC in the for-loop of
writeback_store the error code is being overwritten by a ret = len
assignment at the end of the function and the error codes are being
lost.  Fix this by assigning ret = len at the start of the function and
remove the assignment from the end, hence allowing ret to be preserved
when error codes are assigned to it.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
Fixes: 5a7c53a008fb ("zram: support idle/huge page writeback")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4 years agozram: try to avoid worst-case scenario on same element pages
Taejoon Song [Fri, 31 Jan 2020 06:15:22 +0000 (22:15 -0800)]
zram: try to avoid worst-case scenario on same element pages

The worst-case scenario on finding same element pages is that almost all
elements are same at the first glance but only last few elements are
different.

Since the same element tends to be grouped from the beginning of the
pages, if we check the first element with the last element before
looping through all elements, we might have some chances to quickly
detect non-same element pages.

 1. Test is done under LG webOS TV (64-bit arch)
 2. Dump the swap-out pages (~819200 pages)
 3. Analyze the pages with simple test script which counts the iteration
    number and measures the speed at off-line

Under 64-bit arch, the worst iteration count is PAGE_SIZE / 8 bytes =
512.  The speed is based on the time to consume page_same_filled()
function only.  The result, on average, is listed as below:

                                     Num of Iter    Speed(MB/s)
  Looping-Forward (Orig)                 38            99265
  Looping-Backward                       36           102725
  Last-element-check (This Patch)        33           125072

The result shows that the average iteration count decreases by 13% and
the speed increases by 25% with this patch.  This patch does not
increase the overall time complexity, though.

I also ran simpler version which uses backward loop.  Just looping
backward also makes some improvement, but less than this patch.

[taejoon.song@lge.com: fix off-by-one]
Link: http://lkml.kernel.org/r/1578642001-11765-1-git-send-email-taejoon.song@lge.com
Link: http://lkml.kernel.org/r/1575424418-16119-1-git-send-email-taejoon.song@lge.com
Signed-off-by: Taejoon Song <taejoon.song@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>