]> git.baikalelectronics.ru Git - kernel.git/log
kernel.git
3 years agoblock: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output
Johannes Thumshirn [Mon, 4 Oct 2021 07:22:07 +0000 (16:22 +0900)]
block: decode QUEUE_FLAG_HCTX_ACTIVE in debugfs output

While debugging an issue we've found that $DEBUGFS/block/$disk/state
doesn't decode QUEUE_FLAG_HCTX_ACTIVE but only displays its numerical
value.

Add QUEUE_FLAG(HCTX_ACTIVE) to the blk_queue_flag_name array so it'll get
decoded properly.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/4351076388918075bd80ef07756f9d2ce63be12c.1633332053.git.johannes.thumshirn@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: genhd: fix double kfree() in __alloc_disk_node()
Tetsuo Handa [Sat, 2 Oct 2021 09:23:02 +0000 (18:23 +0900)]
block: genhd: fix double kfree() in __alloc_disk_node()

syzbot is reporting use-after-free read at bdev_free_inode() [1], for
kfree() from __alloc_disk_node() is called before bdev_free_inode()
(which is called after RCU grace period) reads bdev->bd_disk and calls
kfree(bdev->bd_disk).

Fix use-after-free read followed by double kfree() problem
by making sure that bdev->bd_disk is NULL when calling iput().

Link: https://syzkaller.appspot.com/bug?extid=8281086e8a6fbfbd952a
Reported-by: syzbot <syzbot+8281086e8a6fbfbd952a@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e6dd13c5-8db0-4392-6e78-a42ee5d2a1c4@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agonbd: use shifts rather than multiplies
Nick Desaulniers [Mon, 20 Sep 2021 23:25:33 +0000 (16:25 -0700)]
nbd: use shifts rather than multiplies

commit f8381a6d4288 ("nbd: add the check to prevent overflow in
__nbd_ioctl()") raised an issue from the fallback helpers added in
commit fe18122dc389 ("compiler.h: enable builtin overflow checkers and
add fallback code")

ERROR: modpost: "__divdi3" [drivers/block/nbd.ko] undefined!

As Stephen Rothwell notes:
  The added check_mul_overflow() call is being passed 64 bit values.
  COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW is not set for this build (see
  include/linux/overflow.h).

Specifically, the helpers for checking whether the results of a
multiplication overflowed (__unsigned_mul_overflow,
__signed_add_overflow) use the division operator when
!COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW.  This is problematic for 64b
operands on 32b hosts.

This was fixed upstream by
commit e582da50536d ("Documentation: raise minimum supported version of
GCC to 5.1")
which is not suitable to be backported to stable.

Further, __builtin_mul_overflow() would emit a libcall to a
compiler-rt-only symbol when compiling with clang < 14 for 32b targets.

ld.lld: error: undefined symbol: __mulodi4

In order to keep stable buildable with GCC 4.9 and clang < 14, modify
struct nbd_config to instead track the number of bits of the block size;
reconstructing the block size using runtime checked shifts that are not
problematic for those compilers and in a ways that can be backported to
stable.

In nbd_set_size, we do validate that the value of blksize must be a
power of two (POT) and is in the range of [512, PAGE_SIZE] (both
inclusive).

This does modify the debugfs interface.

Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Link: https://github.com/ClangBuiltLinux/linux/issues/1438
Link: https://lore.kernel.org/all/20210909182525.372ee687@canb.auug.org.au/
Link: https://lore.kernel.org/stable/CAHk-=whiQBofgis_rkniz8GBP9wZtSZdcDEffgSLO62BUGV3gg@mail.gmail.com/
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Kees Cook <keescook@chromium.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20210920232533.4092046-1-ndesaulniers@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoRevert "block, bfq: honor already-setup queue merges"
Jens Axboe [Tue, 28 Sep 2021 12:33:15 +0000 (06:33 -0600)]
Revert "block, bfq: honor already-setup queue merges"

This reverts commit fde25220c9a54ad2df67bae8072231c2e122f4ef.

We have had several folks complain that this causes hangs for them, which
is especially problematic as the commit has also hit stable already.

As no resolution seems to be forthcoming right now, revert the patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214503
Fixes: fde25220c9a5 ("block, bfq: honor already-setup queue merges")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agonvme: add command id quirk for apple controllers
Keith Busch [Mon, 27 Sep 2021 15:43:06 +0000 (08:43 -0700)]
nvme: add command id quirk for apple controllers

Some apple controllers use the command id as an index to implementation
specific data structures and will fail if the value is out of bounds.
The nvme driver's recently introduced command sequence number breaks
this controller.

Provide a quirk so these spec incompliant controllers can function as
before. The driver will not have the ability to detect bad completions
when this quirk is used, but we weren't previously checking this anyway.

The quirk bit was selected so that it can readily apply to stable.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=214509
Cc: Sven Peter <sven@svenpeter.dev>
Reported-by: Orlando Chamberlain <redecorating@protonmail.com>
Reported-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Sven Peter <sven@svenpeter.dev>
Link: https://lore.kernel.org/r/20210927154306.387437-1-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: hold ->invalidate_lock in blkdev_fallocate
Ming Lei [Thu, 23 Sep 2021 02:37:51 +0000 (10:37 +0800)]
block: hold ->invalidate_lock in blkdev_fallocate

When running ->fallocate(), blkdev_fallocate() should hold
mapping->invalidate_lock to prevent page cache from being accessed,
otherwise stale data may be read in page cache.

Without this patch, blktests block/009 fails sometimes. With this patch,
block/009 can pass always.

Also as Jan pointed out, no pages can be created in the discarded area
while you are holding the invalidate_lock, so remove the 2nd
truncate_bdev_range().

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210923023751.1441091-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblktrace: Fix uaf in blk_trace access after removing by sysfs
Zhihao Cheng [Thu, 23 Sep 2021 13:49:21 +0000 (21:49 +0800)]
blktrace: Fix uaf in blk_trace access after removing by sysfs

There is an use-after-free problem triggered by following process:

      P1(sda) P2(sdb)
echo 0 > /sys/block/sdb/trace/enable
  blk_trace_remove_queue
    synchronize_rcu
    blk_trace_free
      relay_close
rcu_read_lock
__blk_add_trace
  trace_note_tsk
  (Iterate running_trace_list)
        relay_close_buf
  relay_destroy_buf
    kfree(buf)
    trace_note(sdb's bt)
      relay_reserve
        buf->offset <- nullptr deference (use-after-free) !!!
rcu_read_unlock

[  502.714379] BUG: kernel NULL pointer dereference, address:
0000000000000010
[  502.715260] #PF: supervisor read access in kernel mode
[  502.715903] #PF: error_code(0x0000) - not-present page
[  502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0
[  502.717252] Oops: 0000 [#1] SMP
[  502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360
[  502.732872] Call Trace:
[  502.733193]  __blk_add_trace.cold+0x137/0x1a3
[  502.733734]  blk_add_trace_rq+0x7b/0xd0
[  502.734207]  blk_add_trace_rq_issue+0x54/0xa0
[  502.734755]  blk_mq_start_request+0xde/0x1b0
[  502.735287]  scsi_queue_rq+0x528/0x1140
...
[  502.742704]  sg_new_write.isra.0+0x16e/0x3e0
[  502.747501]  sg_ioctl+0x466/0x1100

Reproduce method:
  ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
  ioctl(/dev/sda, BLKTRACESTART)
  ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
  ioctl(/dev/sdb, BLKTRACESTART)

  echo 0 > /sys/block/sdb/trace/enable &
  // Add delay(mdelay/msleep) before kernel enters blk_trace_free()

  ioctl$SG_IO(/dev/sda, SG_IO, ...)
  // Enters trace_note_tsk() after blk_trace_free() returned
  // Use mdelay in rcu region rather than msleep(which may schedule out)

Remove blk_trace from running_list before calling blk_trace_free() by
sysfs if blk_trace is at Blktrace_running state.

Fixes: caa50b432d555d ("blktrace: add ftrace plugin")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: don't call rq_qos_ops->done_bio if the bio isn't tracked
Ming Lei [Fri, 24 Sep 2021 11:07:04 +0000 (19:07 +0800)]
block: don't call rq_qos_ops->done_bio if the bio isn't tracked

rq_qos framework is only applied on request based driver, so:

1) rq_qos_done_bio() needn't to be called for bio based driver

2) rq_qos_done_bio() needn't to be called for bio which isn't tracked,
such as bios ended from error handling code.

Especially in bio_endio():

1) request queue is referred via bio->bi_bdev->bd_disk->queue, which
may be gone since request queue refcount may not be held in above two
cases

2) q->rq_qos may be freed in blk_cleanup_queue() when calling into
__rq_qos_done_bio()

Fix the potential kernel panic by not calling rq_qos_ops->done_bio if
the bio isn't tracked. This way is safe because both ioc_rqos_done_bio()
and blkcg_iolatency_done_bio() are nop if the bio isn't tracked.

Reported-by: Yu Kuai <yukuai3@huawei.com>
Cc: tj@kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210924110704.1541818-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme into block-5.15
Jens Axboe [Fri, 24 Sep 2021 13:15:21 +0000 (07:15 -0600)]
Merge tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme into block-5.15

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.15:

 - keep ctrl->namespaces ordered (me)
 - fix incorrect h2cdata pdu offset accounting in nvme-tcp
   (Sagi Grimberg)
 - handled updated hw_queues in nvme-fc more carefully (Daniel Wagner,
   James Smart)"

* tag 'nvme-5.15-2021-09-24' of git://git.infradead.org/nvme:
  nvme: keep ctrl->namespaces ordered
  nvme-tcp: fix incorrect h2cdata pdu offset accounting
  nvme-fc: remove freeze/unfreeze around update_nr_hw_queues
  nvme-fc: avoid race between time out and tear down
  nvme-fc: update hardware queues before using them

3 years agoMerge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md...
Jens Axboe [Wed, 22 Sep 2021 18:43:18 +0000 (12:43 -0600)]
Merge branch 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.15

Pull MD fix from Song.

* 'md-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
  md: fix a lock order reversal in md_alloc

3 years agomd: fix a lock order reversal in md_alloc
Christoph Hellwig [Wed, 1 Sep 2021 11:38:29 +0000 (13:38 +0200)]
md: fix a lock order reversal in md_alloc

Commit 1358197baa861e ("md: Fix race when creating a new md device.")
not only moved assigning mddev->gendisk before calling add_disk, which
fixes the races described in the commit log, but also added a
mddev->open_mutex critical section over add_disk and creation of the
md kobj.  Adding a kobject after add_disk is racy vs deleting the gendisk
right after adding it, but md already prevents against that by holding
a mddev->active reference.

On the other hand taking this lock added a lock order reversal with what
is not disk->open_mutex (used to be bdev->bd_mutex when the commit was
added) for partition devices, which need that lock for the internal open
for the partition scan, and a recent commit also takes it for
non-partitioned devices, leading to further lockdep splatter.

Fixes: 1358197baa86 ("md: Fix race when creating a new md device.")
Fixes: e2b06c0b7c9f ("block: support delayed holder registration")
Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Song Liu <songliubraving@fb.com>
3 years agonvme: keep ctrl->namespaces ordered
Christoph Hellwig [Tue, 14 Sep 2021 06:38:20 +0000 (08:38 +0200)]
nvme: keep ctrl->namespaces ordered

Various places in the nvme code that rely on ctrl->namespace to be
ordered.  Ensure that the namespae is inserted into the list at the
right position from the start instead of sorting it after the fact.

Fixes: e0a487cadca7 ("NVMe: Implement namespace list scanning")
Reported-by: Anton Eidelman <anton.eidelman@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
3 years agonvme-tcp: fix incorrect h2cdata pdu offset accounting
Sagi Grimberg [Tue, 14 Sep 2021 15:38:55 +0000 (18:38 +0300)]
nvme-tcp: fix incorrect h2cdata pdu offset accounting

When the controller sends us multiple r2t PDUs in a single
request we need to account for it correctly as our send/recv
context run concurrently (i.e. we get a new r2t with r2t_offset
before we updated our iterator and req->data_sent marker). This
can cause wrong offsets to be sent to the controller.

To fix that, we will first know that this may happen only in
the send sequence of the last page, hence we will take
the r2t_offset to the h2c PDU data_offset, and in
nvme_tcp_try_send_data loop, we make sure to increment
the request markers also when we completed a PDU but
we are expecting more r2t PDUs as we still did not send
the entire data of the request.

Fixes: e768b78848fd ("nvme-tcp: fix possible use-after-completion")
Reported-by: Nowak, Lukasz <Lukasz.Nowak@Dell.com>
Tested-by: Nowak, Lukasz <Lukasz.Nowak@Dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-fc: remove freeze/unfreeze around update_nr_hw_queues
James Smart [Tue, 14 Sep 2021 09:20:08 +0000 (11:20 +0200)]
nvme-fc: remove freeze/unfreeze around update_nr_hw_queues

Remove the freeze/unfreeze around changes to the number of hardware
queues. Study and retest has indicated there are no ios that can be
active at this point so there is nothing to freeze.

nvme-fc is draining the queues in the shutdown and error recovery path
in __nvme_fc_abort_outstanding_ios.

This patch primarily reverts 08a896479459 "nvme-fc: wait for queues to
freeze before calling update_hr_hw_queues". It's not an exact revert as
it leaves the adjusting of hw queues only if the count changes.

Signed-off-by: James Smart <jsmart2021@gmail.com>
[dwagner: added explanation why no IO is pending]
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-fc: avoid race between time out and tear down
James Smart [Tue, 14 Sep 2021 09:20:07 +0000 (11:20 +0200)]
nvme-fc: avoid race between time out and tear down

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue.

This patch merges the admin and io sync ops into the queue teardown logic
as shown in the RDMA patch d6be6bf350 "nvme-rdma: avoid race between time
out and tear down". There is no teardown_lock in nvme-fc.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-fc: update hardware queues before using them
Daniel Wagner [Tue, 14 Sep 2021 09:20:06 +0000 (11:20 +0200)]
nvme-fc: update hardware queues before using them

In case the number of hardware queues changes, we need to update the
tagset and the mapping of ctx to hctx first.

If we try to create and connect the I/O queues first, this operation
will fail (target will reject the connect call due to the wrong number
of queues) and hence we bail out of the recreate function. Then we
will to try the very same operation again, thus we don't make any
progress.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd
Li Jinlin [Tue, 14 Sep 2021 04:26:05 +0000 (12:26 +0800)]
blk-cgroup: fix UAF by grabbing blkcg lock before destroying blkg pd

KASAN reports a use-after-free report when doing fuzz test:

[693354.104835] ==================================================================
[693354.105094] BUG: KASAN: use-after-free in bfq_io_set_weight_legacy+0xd3/0x160
[693354.105336] Read of size 4 at addr ffff888be0a35664 by task sh/1453338

[693354.105607] CPU: 41 PID: 1453338 Comm: sh Kdump: loaded Not tainted 4.18.0-147
[693354.105610] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.81 07/02/2018
[693354.105612] Call Trace:
[693354.105621]  dump_stack+0xf1/0x19b
[693354.105626]  ? show_regs_print_info+0x5/0x5
[693354.105634]  ? printk+0x9c/0xc3
[693354.105638]  ? cpumask_weight+0x1f/0x1f
[693354.105648]  print_address_description+0x70/0x360
[693354.105654]  kasan_report+0x1b2/0x330
[693354.105659]  ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105665]  ? bfq_io_set_weight_legacy+0xd3/0x160
[693354.105670]  bfq_io_set_weight_legacy+0xd3/0x160
[693354.105675]  ? bfq_cpd_init+0x20/0x20
[693354.105683]  cgroup_file_write+0x3aa/0x510
[693354.105693]  ? ___slab_alloc+0x507/0x540
[693354.105698]  ? cgroup_file_poll+0x60/0x60
[693354.105702]  ? 0xffffffff89600000
[693354.105708]  ? usercopy_abort+0x90/0x90
[693354.105716]  ? mutex_lock+0xef/0x180
[693354.105726]  kernfs_fop_write+0x1ab/0x280
[693354.105732]  ? cgroup_file_poll+0x60/0x60
[693354.105738]  vfs_write+0xe7/0x230
[693354.105744]  ksys_write+0xb0/0x140
[693354.105749]  ? __ia32_sys_read+0x50/0x50
[693354.105760]  do_syscall_64+0x112/0x370
[693354.105766]  ? syscall_return_slowpath+0x260/0x260
[693354.105772]  ? do_page_fault+0x9b/0x270
[693354.105779]  ? prepare_exit_to_usermode+0xf9/0x1a0
[693354.105784]  ? enter_from_user_mode+0x30/0x30
[693354.105793]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.105875] Allocated by task 1453337:
[693354.106001]  kasan_kmalloc+0xa0/0xd0
[693354.106006]  kmem_cache_alloc_node_trace+0x108/0x220
[693354.106010]  bfq_pd_alloc+0x96/0x120
[693354.106015]  blkcg_activate_policy+0x1b7/0x2b0
[693354.106020]  bfq_create_group_hierarchy+0x1e/0x80
[693354.106026]  bfq_init_queue+0x678/0x8c0
[693354.106031]  blk_mq_init_sched+0x1f8/0x460
[693354.106037]  elevator_switch_mq+0xe1/0x240
[693354.106041]  elevator_switch+0x25/0x40
[693354.106045]  elv_iosched_store+0x1a1/0x230
[693354.106049]  queue_attr_store+0x78/0xb0
[693354.106053]  kernfs_fop_write+0x1ab/0x280
[693354.106056]  vfs_write+0xe7/0x230
[693354.106060]  ksys_write+0xb0/0x140
[693354.106064]  do_syscall_64+0x112/0x370
[693354.106069]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.106114] Freed by task 1453336:
[693354.106225]  __kasan_slab_free+0x130/0x180
[693354.106229]  kfree+0x90/0x1b0
[693354.106233]  blkcg_deactivate_policy+0x12c/0x220
[693354.106238]  bfq_exit_queue+0xf5/0x110
[693354.106241]  blk_mq_exit_sched+0x104/0x130
[693354.106245]  __elevator_exit+0x45/0x60
[693354.106249]  elevator_switch_mq+0xd6/0x240
[693354.106253]  elevator_switch+0x25/0x40
[693354.106257]  elv_iosched_store+0x1a1/0x230
[693354.106261]  queue_attr_store+0x78/0xb0
[693354.106264]  kernfs_fop_write+0x1ab/0x280
[693354.106268]  vfs_write+0xe7/0x230
[693354.106271]  ksys_write+0xb0/0x140
[693354.106275]  do_syscall_64+0x112/0x370
[693354.106280]  entry_SYSCALL_64_after_hwframe+0x65/0xca

[693354.106329] The buggy address belongs to the object at ffff888be0a35580
                 which belongs to the cache kmalloc-1k of size 1024
[693354.106736] The buggy address is located 228 bytes inside of
                 1024-byte region [ffff888be0a35580ffff888be0a35980)
[693354.107114] The buggy address belongs to the page:
[693354.107273] page:ffffea002f828c00 count:1 mapcount:0 mapping:ffff888107c17080 index:0x0 compound_mapcount: 0
[693354.107606] flags: 0x17ffffc0008100(slab|head)
[693354.107760] raw: 0017ffffc0008100 ffffea002fcbc808 ffffea0030bd3a08 ffff888107c17080
[693354.108020] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
[693354.108278] page dumped because: kasan: bad access detected

[693354.108511] Memory state around the buggy address:
[693354.108671]  ffff888be0a35500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[693354.116396]  ffff888be0a35580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.124473] >ffff888be0a35600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.132421]                                                        ^
[693354.140284]  ffff888be0a35680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.147912]  ffff888be0a35700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[693354.155281] ==================================================================

blkgs are protected by both queue and blkcg locks and holding
either should stabilize them. However, the path of destroying
blkg policy data is only protected by queue lock in
blkcg_activate_policy()/blkcg_deactivate_policy(). Other tasks
can get the blkg policy data before the blkg policy data is
destroyed, and use it after destroyed, which will result in a
use-after-free.

CPU0                             CPU1
blkcg_deactivate_policy
  spin_lock_irq(&q->queue_lock)
                                 bfq_io_set_weight_legacy
                                   spin_lock_irq(&blkcg->lock)
                                   blkg_to_bfqg(blkg)
                                     pd_to_bfqg(blkg->pd[pol->plid])
                                     ^^^^^^blkg->pd[pol->plid] != NULL
                                           bfqg != NULL
  pol->pd_free_fn(blkg->pd[pol->plid])
    pd_to_bfqg(blkg->pd[pol->plid])
    bfqg_put(bfqg)
      kfree(bfqg)
  blkg->pd[pol->plid] = NULL
  spin_unlock_irq(q->queue_lock);
                                   bfq_group_set_weight(bfqg, val, 0)
                                     bfqg->entity.new_weight
                                     ^^^^^^trigger uaf here
                                   spin_unlock_irq(&blkcg->lock);

Fix by grabbing the matching blkcg lock before trying to
destroy blkg policy data.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Li Jinlin <lijinlin3@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210914042605.3260596-1-lijinlin3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblkcg: fix memory leak in blk_iolatency_init
Yanfei Xu [Wed, 15 Sep 2021 07:24:26 +0000 (15:24 +0800)]
blkcg: fix memory leak in blk_iolatency_init

BUG: memory leak
unreferenced object 0xffff888129acdb80 (size 96):
  comm "syz-executor.1", pid 12661, jiffies 4294962682 (age 15.220s)
  hex dump (first 32 bytes):
    20 47 c9 85 ff ff ff ff 20 d4 8e 29 81 88 ff ff   G...... ..)....
    01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff82264ec8>] kmalloc include/linux/slab.h:591 [inline]
    [<ffffffff82264ec8>] kzalloc include/linux/slab.h:721 [inline]
    [<ffffffff82264ec8>] blk_iolatency_init+0x28/0x190 block/blk-iolatency.c:724
    [<ffffffff8225b8c4>] blkcg_init_queue+0xb4/0x1c0 block/blk-cgroup.c:1185
    [<ffffffff822253da>] blk_alloc_queue+0x22a/0x2e0 block/blk-core.c:566
    [<ffffffff8223b175>] blk_mq_init_queue_data block/blk-mq.c:3100 [inline]
    [<ffffffff8223b175>] __blk_mq_alloc_disk+0x25/0xd0 block/blk-mq.c:3124
    [<ffffffff826a9303>] loop_add+0x1c3/0x360 drivers/block/loop.c:2344
    [<ffffffff826a966e>] loop_control_get_free drivers/block/loop.c:2501 [inline]
    [<ffffffff826a966e>] loop_control_ioctl+0x17e/0x2e0 drivers/block/loop.c:2516
    [<ffffffff81597eec>] vfs_ioctl fs/ioctl.c:51 [inline]
    [<ffffffff81597eec>] __do_sys_ioctl fs/ioctl.c:874 [inline]
    [<ffffffff81597eec>] __se_sys_ioctl fs/ioctl.c:860 [inline]
    [<ffffffff81597eec>] __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:860
    [<ffffffff843fa745>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<ffffffff843fa745>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
    [<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Once blk_throtl_init() queue init failed, blkcg_iolatency_exit() will
not be invoked for cleanup. That leads a memory leak. Swap the
blk_throtl_init() and blk_iolatency_init() calls can solve this.

Reported-by: syzbot+01321b15cc98e6bf96d6@syzkaller.appspotmail.com
Fixes: 55a05fe0b4ac (block/blk-cgroup: Swap the blk_throtl_init() and blk_iolatency_init() calls)
Signed-off-by: Yanfei Xu <yanfei.xu@windriver.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20210915072426.4022924-1-yanfei.xu@windriver.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoMerge tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme into block-5.15
Jens Axboe [Wed, 15 Sep 2021 13:53:32 +0000 (07:53 -0600)]
Merge tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme into block-5.15

Pull NVMe fixes from Christoph:

"nvme fixes for Linux 5.15

 - fix ANA state updates when a namespace is not present (Anton Eidelman)
 - nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show
   (Dan Carpenter)
 - avoid race in shutdown namespace removal (Daniel Wagner)
 - fix io_work priority inversion in nvme-tcp (Keith Busch)
 - destroy cm id before destroy qp to avoid use after free (Ruozhu Li)"

* tag 'nvme-5.15-2021-09-15' of git://git.infradead.org/nvme:
  nvme-tcp: fix io_work priority inversion
  nvme-rdma: destroy cm id before destroy qp to avoid use after free
  nvme-multipath: fix ANA state updates when a namespace is not present
  nvme: avoid race in shutdown namespace removal
  nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()

3 years agonvme: remove the call to nvme_update_disk_info in nvme_ns_remove
Christoph Hellwig [Tue, 14 Sep 2021 07:06:57 +0000 (09:06 +0200)]
nvme: remove the call to nvme_update_disk_info in nvme_ns_remove

There is no need to explicitly unregister the integrity profile when
deleting the gendisk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: flush the integrity workqueue in blk_integrity_unregister
Lihong Kou [Tue, 14 Sep 2021 07:06:56 +0000 (09:06 +0200)]
block: flush the integrity workqueue in blk_integrity_unregister

When the integrity profile is unregistered there can still be integrity
reads queued up which could see a NULL verify_fn as shown by the race
window below:

CPU0                                    CPU1
  process_one_work                      nvme_validate_ns
    bio_integrity_verify_fn                nvme_update_ns_info
                                     nvme_update_disk_info
                                       blk_integrity_unregister
                                               ---set queue->integrity as 0
bio_integrity_process
--access bi->profile->verify_fn(bi is a pointer of queue->integity)

Before calling blk_integrity_unregister in nvme_update_disk_info, we must
make sure that there is no work item in the kintegrityd_wq. Just call
blk_flush_integrity to flush the work queue so the bug can be resolved.

Signed-off-by: Lihong Kou <koulihong@huawei.com>
[hch: split up and shortened the changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoblock: check if a profile is actually registered in blk_integrity_unregister
Christoph Hellwig [Tue, 14 Sep 2021 07:06:55 +0000 (09:06 +0200)]
block: check if a profile is actually registered in blk_integrity_unregister

While clearing the profile itself is harmless, we really should not clear
the stable writes flag if it wasn't set due to a registered integrity
profile.

Reported-by: Lihong Kou <koulihong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20210914070657.87677-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agonvme-tcp: fix io_work priority inversion
Keith Busch [Thu, 9 Sep 2021 15:54:52 +0000 (08:54 -0700)]
nvme-tcp: fix io_work priority inversion

Dispatching requests inline with the .queue_rq() call may block while
holding the send_mutex. If the tcp io_work also happens to schedule, it
may see the req_list is non-empty, leaving "pending" true and remaining
in TASK_RUNNING. Since io_work is of higher scheduling priority, the
.queue_rq task may not get a chance to run, blocking forward progress
and leading to io timeouts.

Instead of checking for pending requests within io_work, let the queueing
restart io_work outside the send_mutex lock if there is more work to be
done.

Fixes: 93c663b707fff ("nvme-tcp: rerun io_work if req_list is not empty")
Reported-by: Samuel Jones <sjones@kalrayinc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-rdma: destroy cm id before destroy qp to avoid use after free
Ruozhu Li [Mon, 6 Sep 2021 03:51:34 +0000 (11:51 +0800)]
nvme-rdma: destroy cm id before destroy qp to avoid use after free

We should always destroy cm_id before destroy qp to avoid to get cma
event after qp was destroyed, which may lead to use after free.
In RDMA connection establishment error flow, don't destroy qp in cm
event handler.Just report cm_error to upper level, qp will be destroy
in nvme_rdma_alloc_queue() after destroy cm id.

Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvme-multipath: fix ANA state updates when a namespace is not present
Anton Eidelman [Sun, 12 Sep 2021 18:54:57 +0000 (12:54 -0600)]
nvme-multipath: fix ANA state updates when a namespace is not present

nvme_update_ana_state() has a deficiency that results in a failure to
properly update the ana state for a namespace in the following case:

  NSIDs in ctrl->namespaces: 1, 3,    4
  NSIDs in desc->nsids: 1, 2, 3, 4

Loop iteration 0:
    ns index = 0, n = 0, ns->head->ns_id = 1, nsid = 1, MATCH.
Loop iteration 1:
    ns index = 1, n = 1, ns->head->ns_id = 3, nsid = 2, NO MATCH.
Loop iteration 2:
    ns index = 2, n = 2, ns->head->ns_id = 4, nsid = 4, MATCH.

Where the update to the ANA state of NSID 3 is missed.  To fix this
increment n and retry the update with the same ns when ns->head->ns_id is
higher than nsid,

Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
3 years agonvme: avoid race in shutdown namespace removal
Daniel Wagner [Thu, 2 Sep 2021 09:20:02 +0000 (11:20 +0200)]
nvme: avoid race in shutdown namespace removal

When we remove the siblings entry, we update ns->head->list, hence we
can't separate the removal and test for being empty. They have to be
in the same critical section to avoid a race.

To avoid breaking the refcounting imbalance again, add a list empty
check to nvme_find_ns_head.

Fixes: 92621ff6c9cd ("nvme: fix refcounting imbalance when all paths are down")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agonvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()
Dan Carpenter [Thu, 9 Sep 2021 09:14:40 +0000 (12:14 +0300)]
nvmet: fix a width vs precision bug in nvmet_subsys_attr_serial_show()

This was intended to limit the number of characters printed from
"subsys->serial" to NVMET_SN_MAX_SIZE.  But accidentally the width
specifier was used instead of the precision specifier so it only
affects the alignment and not the number of characters printed.

Fixes: cba3ef262607 ("nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
3 years agoblk-mq: avoid to iterate over stale request
Ming Lei [Mon, 6 Sep 2021 06:50:03 +0000 (14:50 +0800)]
blk-mq: avoid to iterate over stale request

blk-mq can't run allocating driver tag and updating ->rqs[tag]
atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver
tag is released.

So there is chance to iterating over one stale request just after the
tag is allocated and before updating ->rqs[tag].

scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
in-flight requests after scsi host is blocked, so no new scsi command can
be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can
be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT,
but this request may have been kept in another slot of ->rqs[], meantime
the slot can be allocated out but ->rqs[] isn't updated yet. Then this
in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes
trouble in handling scsi error.

Fixes the issue by not iterating over stale request.

Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Reported-by: luojiaxing <luojiaxing@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210906065003.439019-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoLinux 5.15-rc1
Linus Torvalds [Sun, 12 Sep 2021 23:28:37 +0000 (16:28 -0700)]
Linux 5.15-rc1

3 years agoMerge tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Sun, 12 Sep 2021 23:18:15 +0000 (16:18 -0700)]
Merge tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull more perf tools updates from Arnaldo Carvalho de Melo:

 - Add missing fields and remove some duplicate fields when printing a
   perf_event_attr.

 - Fix hybrid config terms list corruption.

 - Update kernel header copies, some resulted in new kernel features
   being automagically added to 'perf trace' syscall/tracepoint argument
   id->string translators.

 - Add a file generated during the documentation build to .gitignore.

 - Add an option to build without libbfd, as some distros, like Debian
   consider its ABI unstable.

 - Add support to print a textual representation of IBS raw sample data
   in 'perf report'.

 - Fix bpf 'perf test' sample mismatch reporting

 - Fix passing arguments to stackcollapse report in a 'perf script'
   python script.

 - Allow build-id with trailing zeros.

 - Look for ImageBase in PE file to compute .text offset.

* tag 'perf-tools-for-v5.15-2021-09-11' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (25 commits)
  tools headers UAPI: Update tools's copy of drm.h headers
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Sync linux/fs.h with the kernel sources
  tools headers UAPI: Sync linux/in.h copy with the kernel sources
  perf tools: Add an option to build without libbfd
  perf tools: Allow build-id with trailing zeros
  perf tools: Fix hybrid config terms list corruption
  perf tools: Factor out copy_config_terms() and free_config_terms()
  perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
  perf tools: Ignore Documentation dependency file
  perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
  tools include UAPI: Update linux/mount.h copy
  perf beauty: Cover more flags in the  move_mount syscall argument beautifier
  tools headers UAPI: Sync linux/prctl.h with the kernel sources
  tools include UAPI: Sync sound/asound.h copy with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources
  tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
  perf report: Add support to print a textual representation of IBS raw sample data
  perf report: Add tools/arch/x86/include/asm/amd-ibs.h
  perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
  ...

3 years agoMerge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda...
Linus Torvalds [Sun, 12 Sep 2021 23:09:26 +0000 (16:09 -0700)]
Merge tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux

Pull compiler attributes updates from Miguel Ojeda:

 - Fix __has_attribute(__no_sanitize_coverage__) for GCC 4 (Marco Elver)

 - Add Nick as Reviewer for compiler_attributes.h (Nick Desaulniers)

 - Move __compiletime_{error|warning} (Nick Desaulniers)

* tag 'compiler-attributes-for-linus-v5.15-rc1-v2' of git://github.com/ojeda/linux:
  compiler_attributes.h: move __compiletime_{error|warning}
  MAINTAINERS: add Nick as Reviewer for compiler_attributes.h
  Compiler Attributes: fix __has_attribute(__no_sanitize_coverage__) for GCC 4

3 years agoMerge tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux
Linus Torvalds [Sun, 12 Sep 2021 23:00:49 +0000 (16:00 -0700)]
Merge tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux

Pull auxdisplay updates from Miguel Ojeda:
 "An assortment of improvements for auxdisplay:

   - Replace symbolic permissions with octal permissions (Jinchao Wang)

   - ks0108: Switch to use module_parport_driver() (Andy Shevchenko)

   - charlcd: Drop unneeded initializers and switch to C99 style (Andy
     Shevchenko)

   - hd44780: Fix oops on module unloading (Lars Poeschel)

   - Add I2C gpio expander example (Ralf Schlatterbeck)"

* tag 'auxdisplay-for-linus-v5.15-rc1' of git://github.com/ojeda/linux:
  auxdisplay: Replace symbolic permissions with octal permissions
  auxdisplay: ks0108: Switch to use module_parport_driver()
  auxdisplay: charlcd: Drop unneeded initializers and switch to C99 style
  auxdisplay: hd44780: Fix oops on module unloading
  auxdisplay: Add I2C gpio expander example

3 years agoMerge tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 12 Sep 2021 19:42:51 +0000 (12:42 -0700)]
Merge tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull CPU hotplug updates from Thomas Gleixner:
 "Updates for the SMP and CPU hotplug:

   - Remove DEFINE_SMP_CALL_CACHE_FUNCTION() which is a left over of the
     original hotplug code and now causing trouble with the ARM64 cache
     topology setup due to the pointless SMP function call.

     It's not longer required as the hotplug callbacks are guaranteed to
     be invoked on the upcoming CPU.

   - Remove the deprecated and now unused CPU hotplug functions

   - Rewrite the CPU hotplug API documentation"

* tag 'smp-urgent-2021-09-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Documentation: core-api/cpuhotplug: Rewrite the API section
  cpu/hotplug: Remove deprecated CPU-hotplug functions.
  thermal: Replace deprecated CPU-hotplug functions.
  drivers: base: cacheinfo: Get rid of DEFINE_SMP_CALL_CACHE_FUNCTION()

3 years agoMerge tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Sep 2021 18:56:00 +0000 (11:56 -0700)]
Merge tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull misc driver fix from Greg KH:
 "Here is a single patch for 5.15-rc1, for the lkdtm misc driver.

  It resolves a build issue that many people were hitting with your
  current tree, and Kees and others felt would be good to get merged
  before -rc1 comes out, to prevent them from having to constantly hit
  it as many development trees restart on -rc1, not older -rc releases.

  It has NOT been in linux-next, but has passed 0-day testing and looks
  'obviously correct' when reviewing it locally :)"

* tag 'char-misc-5.15-rc1-lkdtm' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  lkdtm: Use init_uts_ns.name instead of macros

3 years agoMerge tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi
Linus Torvalds [Sun, 12 Sep 2021 18:44:58 +0000 (11:44 -0700)]
Merge tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi

Pull IPMI updates from Corey Minyard:
 "A couple of very minor fixes for style and rate limiting.

  Nothing big, but probably needs to go in"

* tag 'for-linus-5.15-1' of git://github.com/cminyard/linux-ipmi:
  char: ipmi: use DEVICE_ATTR helper macro
  ipmi: rate limit ipmi smi_event failure message

3 years agoMerge tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Sep 2021 18:37:41 +0000 (11:37 -0700)]
Merge tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Make sure the idle timer expires in hardirq context, on PREEMPT_RT

 - Make sure the run-queue balance callback is invoked only on the
   outgoing CPU

* tag 'sched_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Prevent balance_push() on remote runqueues
  sched/idle: Make the idle timer expire in hard interrupt context

3 years agoMerge tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Sep 2021 18:27:05 +0000 (11:27 -0700)]
Merge tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Borislav Petkov:

 - Fix the futex PI requeue machinery to not return to userspace in
   inconsistent state

 - Avoid a potential null pointer dereference in the ww_mutex deadlock
   check

 - Other smaller cleanups and optimizations

* tag 'locking_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Fix ww_mutex deadlock check
  futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic()
  futex: Avoid redundant task lookup
  futex: Clarify comment for requeue_pi_wake_futex()
  futex: Prevent inconsistent state and exit race
  futex: Return error code instead of assigning it without effect
  locking/rwsem: Add missing __init_rwsem() for PREEMPT_RT

3 years agoMerge tag 'timers_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 12 Sep 2021 18:10:31 +0000 (11:10 -0700)]
Merge tag 'timers_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Borislav Petkov:

 - Handle negative second values properly when converting a timespec64
   to nanoseconds.

* tag 'timers_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Handle negative seconds correctly in timespec64_to_ns()

3 years agoMerge branch 'misc.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 12 Sep 2021 17:43:51 +0000 (10:43 -0700)]
Merge branch 'misc.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull namei updates from Al Viro:
 "Clearing fallout from mkdirat in io_uring series. The fix in the
  kern_path_locked() patch plus associated cleanups"

* 'misc.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  putname(): IS_ERR_OR_NULL() is wrong here
  namei: Standardize callers of filename_create()
  namei: Standardize callers of filename_lookup()
  rename __filename_parentat() to filename_parentat()
  namei: Fix use after free in kern_path_locked

3 years agoMerge tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 12 Sep 2021 17:10:21 +0000 (10:10 -0700)]
Merge tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull smbfs updates from Steve French:
 "cifs/smb3 updates:

   - DFS reconnect fix

   - begin creating common headers for server and client

   - rename the cifs_common directory to smbfs_common to be more
     consistent ie change use of the name cifs to smb (smb3 or smbfs is
     more accurate, as the very old cifs dialect has long been
     superseded by smb3 dialects).

  In the future we can rename the fs/cifs directory to fs/smbfs.

  This does not include the set of multichannel fixes nor the two
  deferred close fixes (they are still being reviewed and tested)"

* tag '5.15-rc-cifs-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: properly invalidate cached root handle when closing it
  cifs: move SMB FSCTL definitions to common code
  cifs: rename cifs_common to smbfs_common
  cifs: update FSCTL definitions

3 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sat, 11 Sep 2021 21:48:42 +0000 (14:48 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - vduse driver ("vDPA Device in Userspace") supporting emulated virtio
   block devices

 - virtio-vsock support for end of record with SEQPACKET

 - vdpa: mac and mq support for ifcvf and mlx5

 - vdpa: management netlink for ifcvf

 - virtio-i2c, gpio dt bindings

 - misc fixes and cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits)
  Documentation: Add documentation for VDUSE
  vduse: Introduce VDUSE - vDPA Device in Userspace
  vduse: Implement an MMU-based software IOTLB
  vdpa: Support transferring virtual addressing during DMA mapping
  vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
  vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
  vhost-iotlb: Add an opaque pointer for vhost IOTLB
  vhost-vdpa: Handle the failure of vdpa_reset()
  vdpa: Add reset callback in vdpa_config_ops
  vdpa: Fix some coding style issues
  file: Export receive_fd() to modules
  eventfd: Export eventfd_wake_count to modules
  iova: Export alloc_iova_fast() and free_iova_fast()
  virtio-blk: remove unneeded "likely" statements
  virtio-balloon: Use virtio_find_vqs() helper
  vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
  vsock_test: update message bounds test for MSG_EOR
  af_vsock: rename variables in receive loop
  virtio/vsock: support MSG_EOR bit processing
  vhost/vsock: support MSG_EOR bit processing
  ...

3 years agoMerge tag 'riscv-for-linus-5.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 11 Sep 2021 21:29:42 +0000 (14:29 -0700)]
Merge tag 'riscv-for-linus-5.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - A pair of defconfig additions, for NVMe and the EFI filesystem
   localization options.

 - A larger address space for stack randomization.

 - A cleanup to our install rules.

 - A DTS update for the Microchip Icicle board, to fix the serial
   console.

 - Support for build-time table sorting, which allows us to have
   __ex_table read-only.

* tag 'riscv-for-linus-5.15-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Move EXCEPTION_TABLE to RO_DATA segment
  riscv: Enable BUILDTIME_TABLE_SORT
  riscv: dts: microchip: mpfs-icicle: Fix serial console
  riscv: move the (z)install rules to arch/riscv/Makefile
  riscv: Improve stack randomisation on RV64
  riscv: defconfig: enable NLS_CODEPAGE_437, NLS_ISO8859_1
  riscv: defconfig: enable BLK_DEV_NVME

3 years agoMerge branch 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall...
Linus Torvalds [Sat, 11 Sep 2021 21:22:28 +0000 (14:22 -0700)]
Merge branch 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux

Pull coccinelle updates from Julia Lawall:
 "These changes update some existing semantic patches with
  respect to some recent changes in the kernel.

  Specifically, the change to kvmalloc.cocci searches for
  kfree_sensitive rather than kzfree, and the change to
  use_after_iter.cocci adds list_entry_is_head as a valid
  use of a list iterator index variable after the end of
  the loop"

* 'for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
  scripts: coccinelle: allow list_entry_is_head() to use pos
  coccinelle: api: rename kzfree to kfree_sensitive

3 years agotools headers UAPI: Update tools's copy of drm.h headers
Arnaldo Carvalho de Melo [Mon, 3 May 2021 14:48:26 +0000 (11:48 -0300)]
tools headers UAPI: Update tools's copy of drm.h headers

Picking the changes from:

  e6dbdf61707cbf18 ("drm: document DRM_IOCTL_MODE_RMFB")

Doesn't result in any tooling changes:

  $ tools/perf/trace/beauty/drm_ioctl.sh  > before
  $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h
  $ tools/perf/trace/beauty/drm_ioctl.sh  > after
  $ diff -u before after

Silencing these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/drm.h' differs from latest version at 'include/uapi/drm/drm.h'
  diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h

Cc: Simon Ser <contact@emersion.fr>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync drm/i915_drm.h with the kernel sources
Arnaldo Carvalho de Melo [Sat, 11 Sep 2021 19:18:28 +0000 (16:18 -0300)]
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources

To pick the changes in:

  140390a6006a5e41 ("drm/i915/userptr: Probe existence of backing struct pages upon creation")
  417b9d863129a967 ("drm/i915/guc: Implement GuC priority management")
  1db4fe8dfd673642 ("drm/i915/uapi: reject set_domain for discrete")
  2e4d96dd1ef7b1d7 ("drm/i915: Add TTM offset argument to mmap.")
  28027a48adef1d13 ("drm/i915/uapi: convert drm_i915_gem_userptr to kernel doc")
  c40ba9e631208a3f ("drm/i915/uapi: reject caching ioctls for discrete")
  1c988c4717e3b75f ("drm/i915/uapi: convert drm_i915_gem_set_domain to kernel doc")
  67fe9a938ff3a344 ("drm/i915/uapi: convert drm_i915_gem_caching to kernel doc")
  de411643c8a1f036 ("drm/i915: Drop the CONTEXT_CLONE API (v2)")
  b0ef98395b3a7c44 ("drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP")
  a86773c64360180c ("drm/i915: Drop I915_CONTEXT_PARAM_RINGSIZE")
  2bbd25fd64b0cbca ("drm/i915: Document the Virtual Engine uAPI")
  fa7a9cdddfeca826 ("drm/i915: Fix busy ioctl commentary")

That doesn't result in any changes to tooling as no new ioctl were
added (at least not perceived by tools/perf/trace/beauty/drm_ioctl.sh).

Addressing this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
  diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync linux/fs.h with the kernel sources
Arnaldo Carvalho de Melo [Fri, 21 May 2021 19:00:31 +0000 (16:00 -0300)]
tools headers UAPI: Sync linux/fs.h with the kernel sources

To pick the change in:

  826ee0cf5f47c922 ("block: add ioctl to read the disk sequence number")

It adds a new ioctl, but we are still not using that to generate tables
for 'perf trace', so no changes in tooling.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/fs.h' differs from latest version at 'include/uapi/linux/fs.h'
  diff -u tools/include/uapi/linux/fs.h include/uapi/linux/fs.h

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matteo Croce <mcroce@microsoft.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync linux/in.h copy with the kernel sources
Arnaldo Carvalho de Melo [Sat, 19 Jun 2021 13:15:22 +0000 (10:15 -0300)]
tools headers UAPI: Sync linux/in.h copy with the kernel sources

To pick the changes in:

  a2386e0b98ec319c ("net/ipv4/ipv6: Replace one-element arraya with flexible-array members")
  325ee0c57a8b43cf ("net/ipv4: Replace one-element array with flexible-array member")

That don't result in any change in tooling, the structs changed remains
with the same layout.

This addresses this build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/in.h' differs from latest version at 'include/uapi/linux/in.h'
  diff -u tools/include/uapi/linux/in.h include/uapi/linux/in.h

Cc: David S. Miller <davem@davemloft.net>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf tools: Add an option to build without libbfd
Ian Rogers [Fri, 10 Sep 2021 22:57:56 +0000 (15:57 -0700)]
perf tools: Add an option to build without libbfd

Some distributions, like debian, don't link perf with libbfd. Add a
build flag to make this configuration buildable and testable.

This was inspired by:

  https://lore.kernel.org/linux-perf-users/20210910102307.2055484-1-tonyg@leastfixedpoint.com/T/#u

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: tony garnock-jones <tonyg@leastfixedpoint.com>
Link: http://lore.kernel.org/lkml/20210910225756.729087-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf tools: Allow build-id with trailing zeros
Namhyung Kim [Fri, 10 Sep 2021 22:46:30 +0000 (15:46 -0700)]
perf tools: Allow build-id with trailing zeros

Currently perf saves a build-id with size but old versions assumes the
size of 20.  In case the build-id is less than 20 (like for MD5), it'd
fill the rest with 0s.

I saw a problem when old version of perf record saved a binary in the
build-id cache and new version of perf reads the data.  The symbols
should be read from the build-id cache (as the path no longer has the
same binary) but it failed due to mismatch in the build-id.

  symsrc__init: build id mismatch for /home/namhyung/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf.

The build-id event in the data has 20 byte build-ids, but it saw a
different size (16) when it reads the build-id of the elf file in the
build-id cache.

  $ readelf -n ~/.debug/.build-id/53/e4c2f42a4c61a2d632d92a72afa08f00000000/elf

  Displaying notes found in: .note.gnu.build-id
    Owner                Data size  Description
    GNU                  0x00000010 NT_GNU_BUILD_ID (unique build ID bitstring)
      Build ID: 53e4c2f42a4c61a2d632d92a72afa08f

Let's fix this by allowing trailing zeros if the size is different.

Fixes: af35db0b5eeeee5b ("perf tools: Pass build_id object to dso__build_id_equal()")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210910224630.1084877-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf tools: Fix hybrid config terms list corruption
Adrian Hunter [Thu, 9 Sep 2021 12:55:08 +0000 (15:55 +0300)]
perf tools: Fix hybrid config terms list corruption

A config terms list was spliced twice, resulting in a never-ending loop
when the list was traversed. Fix by using list_splice_init() and copying
and freeing the lists as necessary.

This patch also depends on patch "perf tools: Factor out
copy_config_terms() and free_config_terms()"

Example on ADL:

 Before:

  # perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname &
  # jobs
  [1]+  Running                    perf record -e "{intel_pt//,cycles/aux-sample-size=4096/pp}" uname
  # perf top -E 10
    PerfTop:    4071 irqs/sec  kernel: 6.9%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles],  (all, 24 CPUs)
  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    97.60%  perf           [.] __evsel__get_config_term
     0.25%  [kernel]       [k] kallsyms_expand_symbol.constprop.13
     0.24%  perf           [.] kallsyms__parse
     0.15%  [kernel]       [k] _raw_spin_lock
     0.14%  [kernel]       [k] number
     0.13%  [kernel]       [k] advance_transaction
     0.08%  [kernel]       [k] format_decode
     0.08%  perf           [.] map__process_kallsym_symbol
     0.08%  perf           [.] rb_insert_color
     0.08%  [kernel]       [k] vsnprintf
  exiting.
  # kill %1

After:

  # perf record -e '{intel_pt//,cycles/aux-sample-size=4096/pp}' uname &
  Linux
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.060 MB perf.data ]
  # perf script | head
       perf-exec   604 [001]  1827.312293:                            psb:  psb offs: 0                       ffffffffb8415e87 pt_config_start+0x37 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb856a3bd event_sched_in.isra.133+0xfd ([kernel.kallsyms]) => ffffffffb856a9a0 perf_pmu_nop_void+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb856b10e merge_sched_in+0x26e ([kernel.kallsyms]) => ffffffffb856a2c0 event_sched_in.isra.133+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb856a45d event_sched_in.isra.133+0x19d ([kernel.kallsyms]) => ffffffffb8568b80 perf_event_set_state.part.61+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb8568b86 perf_event_set_state.part.61+0x6 ([kernel.kallsyms]) => ffffffffb85662a0 perf_event_update_time+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb856a35c event_sched_in.isra.133+0x9c ([kernel.kallsyms]) => ffffffffb8567610 perf_log_itrace_start+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb856a377 event_sched_in.isra.133+0xb7 ([kernel.kallsyms]) => ffffffffb8403b40 x86_pmu_add+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb8403b86 x86_pmu_add+0x46 ([kernel.kallsyms]) => ffffffffb8403940 collect_events+0x0 ([kernel.kallsyms])
       perf-exec   604  1827.312293:          1                       branches:  ffffffffb8403a7b collect_events+0x13b ([kernel.kallsyms]) => ffffffffb8402cd0 collect_event+0x0 ([kernel.kallsyms])

Fixes: 75c4bf188ce928 ("perf parse-events Create two hybrid cache events")
Fixes: d5e075f33ee1c3 ("perf parse-events Create two hybrid raw events")
Fixes: 63d0057e79059b ("perf parse-events Create two hybrid hardware events")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https //lore.kernel.org/r/20210909125508.28693-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf tools: Factor out copy_config_terms() and free_config_terms()
Adrian Hunter [Thu, 9 Sep 2021 12:55:07 +0000 (15:55 +0300)]
perf tools: Factor out copy_config_terms() and free_config_terms()

Factor out copy_config_terms() and free_config_terms() so that they can
be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Link: https //lore.kernel.org/r/20210909125508.28693-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf tools: Fix perf_event_attr__fprintf() missing/dupl. fields
Adrian Hunter [Sat, 11 Sep 2021 12:05:50 +0000 (15:05 +0300)]
perf tools: Fix perf_event_attr__fprintf() missing/dupl. fields

Some fields are missing and text_poke is duplicated. Fix that up.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lore.kernel.org/lkml/20210911120550.12203-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf tools: Ignore Documentation dependency file
Ian Rogers [Fri, 10 Sep 2021 23:22:49 +0000 (16:22 -0700)]
perf tools: Ignore Documentation dependency file

When building directly on the checked out repository the build process
produces a file that should be ignored, so add it to .gitignore.

Fixes: 64790d2269da9bd5 ("perf doc: Fix doc.dep")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210910232249.739661-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoMerge tag 'io_uring-5.15-2021-09-11' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 11 Sep 2021 17:28:14 +0000 (10:28 -0700)]
Merge tag 'io_uring-5.15-2021-09-11' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Fix an off-by-one in a BUILD_BUG_ON() check. Not a real issue right
   now as we have plenty of flags left, but could become one. (Hao)

 - Fix lockdep issue introduced in this merge window (me)

 - Fix a few issues with the worker creation (me, Pavel, Qiang)

 - Fix regression with wq_has_sleeper() for IOPOLL (Pavel)

 - Timeout link error propagation fix (Pavel)

* tag 'io_uring-5.15-2021-09-11' of git://git.kernel.dk/linux-block:
  io_uring: fix off-by-one in BUILD_BUG_ON check of __REQ_F_LAST_BIT
  io_uring: fail links of cancelled timeouts
  io-wq: fix memory leak in create_io_worker()
  io-wq: fix silly logic error in io_task_work_match()
  io_uring: drop ctx->uring_lock before acquiring sqd->lock
  io_uring: fix missing mb() before waitqueue_active
  io-wq: fix cancellation on create-worker failure

3 years agoMerge tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 11 Sep 2021 17:19:51 +0000 (10:19 -0700)]
Merge tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - NVMe pull request from Christoph:
     - fix nvmet command set reporting for passthrough controllers (Adam Manzanares)
     - update a MAINTAINERS email address (Chaitanya Kulkarni)
     - set QUEUE_FLAG_NOWAIT for nvme-multipth (me)
     - handle errors from add_disk() (Luis Chamberlain)
     - update the keep alive interval when kato is modified (Tatsuya Sasaki)
     - fix a buffer overrun in nvmet_subsys_attr_serial (Hannes Reinecke)
     - do not reset transport on data digest errors in nvme-tcp (Daniel Wagner)
     - only call synchronize_srcu when clearing current path (Daniel Wagner)
     - revalidate paths during rescan (Hannes Reinecke)

 - Split out the fs/block_dev into block/fops.c and block/bdev.c, which
   has been long overdue. Do this now before -rc1, to avoid annoying
   conflicts due to this (Christoph)

 - blk-throtl use-after-free fix (Li)

 - Improve plug depth for multi-device plugs, greatly increasing md
   resync performance (Song)

 - blkdev_show() locking fix (Tetsuo)

 - n64cart error check fix (Yang)

* tag 'block-5.15-2021-09-11' of git://git.kernel.dk/linux-block:
  n64cart: fix return value check in n64cart_probe()
  blk-mq: allow 4x BLK_MAX_REQUEST_COUNT at blk_plug for multiple_queues
  block: move fs/block_dev.c to block/bdev.c
  block: split out operations on block special files
  blk-throttle: fix UAF by deleteing timer in blk_throtl_exit()
  block: genhd: don't call blkdev_show() with major_names_lock held
  nvme: update MAINTAINERS email address
  nvme: add error handling support for add_disk()
  nvme: only call synchronize_srcu when clearing current path
  nvme: update keep alive interval when kato is modified
  nvme-tcp: Do not reset transport on data digest errors
  nvmet: fixup buffer overrun in nvmet_subsys_attr_serial()
  nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req
  nvmet: looks at the passthrough controller when initializing CAP
  nvme: move nvme_multi_css into nvme.h
  nvme-multipath: revalidate paths during rescan
  nvme-multipath: set QUEUE_FLAG_NOWAIT

3 years agoMerge tag 'libata-5.15-2021-09-11' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 11 Sep 2021 17:18:20 +0000 (10:18 -0700)]
Merge tag 'libata-5.15-2021-09-11' of git://git.kernel.dk/linux-block

Pull libata maintainer update from Jens Axboe:
 "Damien agreed to take over maintainership of libata, and he would be a
  great candidate for it. Update the MAINTAINERS entry to reflect the
  change in maintainer and git tree"

* tag 'libata-5.15-2021-09-11' of git://git.kernel.dk/linux-block:
  libata: pass over maintainership to Damien Le Moal

3 years agoMerge tag 'trace-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Sat, 11 Sep 2021 17:16:30 +0000 (10:16 -0700)]
Merge tag 'trace-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "Minor fixes to the processing of the bootconfig tree"

* tag 'trace-v5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: Rename xbc_node_find_child() to xbc_node_find_subkey()
  tracing/boot: Fix to check the histogram control param is a leaf node
  tracing/boot: Fix trace_boot_hist_add_array() to check array is value

3 years agoMerge tag 'devicetree-fixes-for-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 11 Sep 2021 17:12:46 +0000 (10:12 -0700)]
Merge tag 'devicetree-fixes-for-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree fixes from Rob Herring:

 - Disable fw_devlinks on x86 DT platforms to fix OLPC

 - More replacing oneOf+const with enum on a few new schemas

 - Drop unnecessary type references on Xilinx SPI binding schema

* tag 'devicetree-fixes-for-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
  spi: dt-bindings: xilinx: Drop type reference on *-bits properties
  dt-bindings: More use 'enum' instead of 'oneOf' plus 'const' entries
  of: property: Disable fw_devlink DT support for X86

3 years agoMerge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Linus Torvalds [Sat, 11 Sep 2021 17:05:56 +0000 (10:05 -0700)]
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fix from Stephen Boyd:
 "One patch to fix an unused variable warning in a Qualcomm clk driver"

* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: qcom: gcc-sm6350: Remove unused variable

3 years agoMerge tag 'rtc-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Linus Torvalds [Sat, 11 Sep 2021 16:54:53 +0000 (09:54 -0700)]
Merge tag 'rtc-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "The broken down time conversion is similar to what is done in the time
  subsystem since v5.14. The rest is fairly straightforward.

  Subsystem:
   - Switch to Neri and Schneider time conversion algorithm

  Drivers:
   - rx8025: add rx8035 support
   - s5m: modernize driver and set range"

* tag 'rtc-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: rx8010: select REGMAP_I2C
  dt-bindings: rtc: add Epson RX-8025 and RX-8035
  rtc: rx8025: implement RX-8035 support
  rtc: cmos: remove stale REVISIT comments
  rtc: tps65910: Correct driver module alias
  rtc: move RTC_LIB_KUNIT_TEST to proper location
  rtc: lib_test: add MODULE_LICENSE
  rtc: Improve performance of rtc_time64_to_tm(). Add tests.
  rtc: s5m: set range
  rtc: s5m: enable wakeup only when available
  rtc: s5m: signal the core when alarm are not available
  rtc: s5m: switch to devm_rtc_allocate_device

3 years agoMerge tag 'firewire-update' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394...
Linus Torvalds [Sat, 11 Sep 2021 16:47:33 +0000 (09:47 -0700)]
Merge tag 'firewire-update' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire updates from Stefan Richter:

 - Migrate the bus snooper driver 'nosy' from PCI to DMA API

 - Small janitorial cleanup in the IPv4/v6-over-1394 driver

[ The 'nosy' change already come in as a different commit through Greg
  KH in the misc tree back in the previous merge window, so only the
  cleanup ends up being new to 5.15   - Linus ]

* tag 'firewire-update' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: nosy: switch from 'pci_' to 'dma_' API
  firewire: net: remove unused variable 'guid'

3 years agoMerge tag 'pwm/for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry...
Linus Torvalds [Sat, 11 Sep 2021 16:26:00 +0000 (09:26 -0700)]
Merge tag 'pwm/for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm

Pull pwm updates from Thierry Reding:
 "The changes this time around are mostly janitorial in nature. A lot of
  this is simplifications of drivers using device-managed functions and
  improving compilation coverage.

  The Mediatek display PWM driver now supports the atomic API.

  Cleanups and minor fixes make up the remainder of this set"

* tag 'pwm/for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (54 commits)
  pwm: mtk-disp: Implement atomic API .get_state()
  pwm: mtk-disp: Fix overflow in period and duty calculation
  pwm: mtk-disp: Implement atomic API .apply()
  pwm: mtk-disp: Adjust the clocks to avoid them mismatch
  dt-bindings: pwm: rockchip: Add description for rk3568
  pwm: Make pwmchip_remove() return void
  pwm: sun4i: Don't check the return code of pwmchip_remove()
  pwm: sifive: Don't check the return code of pwmchip_remove()
  pwm: samsung: Don't check the return code of pwmchip_remove()
  pwm: renesas-tpu: Don't check the return code of pwmchip_remove()
  pwm: rcar: Don't check the return code of pwmchip_remove()
  pwm: pca9685: Don't check the return code of pwmchip_remove()
  pwm: omap-dmtimer: Don't check the return code of pwmchip_remove()
  pwm: mtk-disp: Don't check the return code of pwmchip_remove()
  pwm: imx-tpm: Don't check the return code of pwmchip_remove()
  pwm: img: Don't check the return code of pwmchip_remove()
  pwm: cros-ec: Don't check the return code of pwmchip_remove()
  pwm: brcmstb: Don't check the return code of pwmchip_remove()
  pwm: atmel-tcb: Don't check the return code of pwmchip_remove()
  pwm: atmel-hlcdc: Don't check the return code of pwmchip_remove()
  ...

3 years agoMerge tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/therma...
Linus Torvalds [Sat, 11 Sep 2021 16:20:57 +0000 (09:20 -0700)]
Merge tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal updates from Daniel Lezcano:

 - Add the tegra3 thermal sensor and fix the compilation testing on
   tegra by adding a dependency on ARCH_TEGRA along with COMPILE_TEST
   (Dmitry Osipenko)

 - Fix the error code for the exynos when devm_get_clk() fails (Dan
   Carpenter)

 - Add the TCC cooling support for AlderLake platform (Sumeet Pawnikar)

 - Add support for hardware trip points for the rcar gen3 thermal driver
   and store TSC id as unsigned int (Niklas Söderlund)

 - Replace the deprecated CPU-hotplug functions get_online_cpus() and
   put_online_cpus (Sebastian Andrzej Siewior)

 - Add the thermal tools directory in the MAINTAINERS file (Daniel
   Lezcano)

 - Fix the Makefile and the cross compilation flags for the userspace
   'tmon' tool (Rolf Eike Beer)

 - Allow to use the IMOK independently from the GDDV on Int340x (Sumeet
   Pawnikar)

 - Fix the stub thermal_cooling_device_register() function prototype
   which does not match the real function (Arnd Bergmann)

 - Make the thermal trip point optional in the DT bindings (Maxime
   Ripard)

 - Fix a typo in a comment in the core code (Geert Uytterhoeven)

 - Reduce the verbosity of the trace in the SoC thermal tegra driver
   (Dmitry Osipenko)

 - Add the support for the LMh (Limit Management hardware) driver on the
   QCom platforms (Thara Gopinath)

 - Allow processing of HWP interrupt by adding a weak function in the
   Intel driver (Srinivas Pandruvada)

 - Prevent an abort of the sensor probe is a channel is not used
   (Matthias Kaehlcke)

* tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/qcom/spmi-adc-tm5: Don't abort probing if a sensor is not used
  thermal/drivers/intel: Allow processing of HWP interrupt
  dt-bindings: thermal: Add dt binding for QCOM LMh
  thermal/drivers/qcom: Add support for LMh driver
  firmware: qcom_scm: Introduce SCM calls to access LMh
  thermal/drivers/tegra-soctherm: Silence message about clamped temperature
  thermal: Spelling s/scallbacks/callbacks/
  dt-bindings: thermal: Make trips node optional
  thermal/core: Fix thermal_cooling_device_register() prototype
  thermal/drivers/int340x: Use IMOK independently
  tools/thermal/tmon: Add cross compiling support
  thermal/tools/tmon: Improve the Makefile
  MAINTAINERS: Add missing userspace thermal tools to the thermal section
  thermal/drivers/intel_powerclamp: Replace deprecated CPU-hotplug functions.
  thermal/drivers/rcar_gen3_thermal: Store TSC id as unsigned int
  thermal/drivers/rcar_gen3_thermal: Add support for hardware trip points
  drivers/thermal/intel: Add TCC cooling support for AlderLake platform
  thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
  thermal/drivers/tegra: Correct compile-testing of drivers
  thermal/drivers/tegra: Add driver for Tegra30 thermal sensor

3 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Sat, 11 Sep 2021 16:08:28 +0000 (09:08 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input updates from Dmitry Torokhov:

 - several device tree bindings for input devices have been converted to
   yaml

 - dropped no longer used ixp4xx-beeper and CSR Prima2 PWRC drivers

 - analog joystick has been converted to use ktime API and no longer
   warn about low resolution timers

 - a few driver fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
  Input: analog - always use ktime functions
  Input: mms114 - support MMS134S
  Input: elan_i2c - reduce the resume time for controller in Whitebox
  Input: edt-ft5x06 - added case for EDT EP0110M09
  Input: adc-keys - drop bogus __refdata annotation
  Input: Fix spelling mistake in Kconfig "useable" -> "usable"
  Input: Fix spelling mistake in Kconfig "Modul" -> "Module"
  Input: remove dead CSR Prima2 PWRC driver
  Input: adp5589-keys - use the right header
  Input: adp5588-keys - use the right header
  dt-bindings: input: tsc2005: Convert to YAML schema
  Input: ep93xx_keypad - prepare clock before using it
  dt-bindings: input: sun4i-lradc: Add wakeup-source
  dt-bindings: input: Convert Regulator Haptic binding to a schema
  dt-bindings: input: Convert Pixcir Touchscreen binding to a schema
  dt-bindings: input: Convert ChipOne ICN8318 binding to a schema
  Input: pm8941-pwrkey - fix comma vs semicolon issue
  dt-bindings: power: reset: qcom-pon: Convert qcom PON binding to yaml
  dt-bindings: input: pm8941-pwrkey: Convert pm8941 power key binding to yaml
  dt-bindings: power: reset: Change 'additionalProperties' to true
  ...

3 years agoriscv: Move EXCEPTION_TABLE to RO_DATA segment
Jisheng Zhang [Thu, 26 Aug 2021 14:11:18 +0000 (22:11 +0800)]
riscv: Move EXCEPTION_TABLE to RO_DATA segment

_ex_table section is read-only, so move it to RO_DATA.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Enable BUILDTIME_TABLE_SORT
Jisheng Zhang [Thu, 26 Aug 2021 14:10:29 +0000 (22:10 +0800)]
riscv: Enable BUILDTIME_TABLE_SORT

Enable BUILDTIME_TABLE_SORT to sort the exception table at build time
rather than during boot.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: dts: microchip: mpfs-icicle: Fix serial console
Geert Uytterhoeven [Thu, 26 Aug 2021 13:19:39 +0000 (15:19 +0200)]
riscv: dts: microchip: mpfs-icicle: Fix serial console

Currently, nothing is output on the serial console, unless
"console=ttyS0,115200n8" or "earlycon" are appended to the kernel
command line.  Enable automatic console selection using
chosen/stdout-path by adding a proper alias, and configure the expected
serial rate.

While at it, add aliases for the other three serial ports, which are
provided on the same micro-USB connector as the first one.

Fixes: 6b215eb5b39be1ae ("RISC-V: Initial DTS for Microchip ICICLE board")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: move the (z)install rules to arch/riscv/Makefile
Masahiro Yamada [Thu, 29 Jul 2021 14:21:47 +0000 (23:21 +0900)]
riscv: move the (z)install rules to arch/riscv/Makefile

Currently, the (z)install targets in arch/riscv/Makefile descend into
arch/riscv/boot/Makefile to invoke the shell script, but there is no
good reason to do so.

arch/riscv/Makefile can run the shell script directly.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: Improve stack randomisation on RV64
Kefeng Wang [Thu, 12 Aug 2021 11:47:02 +0000 (19:47 +0800)]
riscv: Improve stack randomisation on RV64

This enlarges the bits availiable for stack randomisation on RV64 from
the default of 8MiB to 1GiB, to match arm64 and x86.

Also, update the documentation to reflect our support for stack
randomisation.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
[Palmer: commit text]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: defconfig: enable NLS_CODEPAGE_437, NLS_ISO8859_1
Heinrich Schuchardt [Thu, 12 Aug 2021 08:10:27 +0000 (10:10 +0200)]
riscv: defconfig: enable NLS_CODEPAGE_437, NLS_ISO8859_1

The EFI system partition uses the FAT file system. Many distributions add
an entry in /etc/fstab for the ESP. We must ensure that mounting does not
fail.

The default code page for FAT is 437 (cf. CONFIG_FAT_DEFAULT_CODEPAGE).
The default IO character set is "iso8859-1" (cf. CONFIG_NLS_ISO8859_1).

So let's enable NLS_CODEPAGE_437 and NLS_ISO8859_1 in defconfig.

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoriscv: defconfig: enable BLK_DEV_NVME
Heinrich Schuchardt [Thu, 12 Aug 2021 08:10:26 +0000 (10:10 +0200)]
riscv: defconfig: enable BLK_DEV_NVME

NVMe is a non-volatile storage media attached via PCIe.
As NVMe has much higher throughput than other block devices like
SATA it is a must have for RISC-V. Enable CONFIG_BLK_DEV_NVME.

The HiFive Unmatched is a board providing M.2 slots for NVMe drives.
Enable CONFIG_PCIE_FU740.

Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
3 years agoDocumentation: core-api/cpuhotplug: Rewrite the API section
Thomas Gleixner [Thu, 9 Sep 2021 12:34:59 +0000 (14:34 +0200)]
Documentation: core-api/cpuhotplug: Rewrite the API section

Dave stumbled over the incomplete and confusing documentation of the CPU
hotplug API.

Rewrite it, add the missing function documentations and correct the
existing ones.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210909123212.489059409@linutronix.de
3 years agocpu/hotplug: Remove deprecated CPU-hotplug functions.
Sebastian Andrzej Siewior [Tue, 3 Aug 2021 14:16:21 +0000 (16:16 +0200)]
cpu/hotplug: Remove deprecated CPU-hotplug functions.

No users in tree use the deprecated CPU-hotplug functions anymore.

Remove them.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-39-bigeasy@linutronix.de
3 years agothermal: Replace deprecated CPU-hotplug functions.
Sebastian Andrzej Siewior [Tue, 3 Aug 2021 14:16:02 +0000 (16:16 +0200)]
thermal: Replace deprecated CPU-hotplug functions.

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-20-bigeasy@linutronix.de
3 years agoMerge branch 'linus' into smp/urgent
Thomas Gleixner [Fri, 10 Sep 2021 22:38:47 +0000 (00:38 +0200)]
Merge branch 'linus' into smp/urgent

Ensure that all usage sites of get/put_online_cpus() except for the
struggler in drivers/thermal are gone. So the last user and the deprecated
inlines can be removed.

3 years agoperf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions
Arnaldo Carvalho de Melo [Fri, 10 Sep 2021 20:26:53 +0000 (17:26 -0300)]
perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions

The btf__get_from_id() function was deprecated in favour of
btf__load_from_kernel_by_id(), but it is still avaiable, so use it to
provide a weak function btf__load_from_kernel_by_id() for older libbpf
when building perf with LIBBPF_DYNAMIC=1, i.e. using the system's libbpf
package.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools include UAPI: Update linux/mount.h copy
Arnaldo Carvalho de Melo [Thu, 1 Jul 2021 16:34:38 +0000 (13:34 -0300)]
tools include UAPI: Update linux/mount.h copy

To pick the changes from:

  ccf292f7f11338ea ("move_mount: allow to add a mount into an existing group")

That ends up adding support for the new MOVE_MOUNT_SET_GROUP move_mount
flag.

  $ tools/perf/trace/beauty/move_mount_flags.sh > before
  $ cp include/uapi/linux/mount.h tools/include/uapi/linux/mount.h
  $ tools/perf/trace/beauty/move_mount_flags.sh > after
  $ diff -u before after
  --- before 2021-09-10 12:28:43.865279808 -0300
  +++ after 2021-09-10 12:28:50.183429184 -0300
  @@ -5,4 +5,5 @@
    [ilog2(0x00000010) + 1] = "T_SYMLINKS",
    [ilog2(0x00000020) + 1] = "T_AUTOMOUNTS",
    [ilog2(0x00000040) + 1] = "T_EMPTY_PATH",
  + [ilog2(0x00000100) + 1] = "SET_GROUP",
   };
  $

So now one can use it in --filter expressions for tracepoints.

This silences this perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/mount.h' differs from latest version at 'include/uapi/linux/mount.h'
  diff -u tools/include/uapi/linux/mount.h include/uapi/linux/mount.h

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf beauty: Cover more flags in the move_mount syscall argument beautifier
Arnaldo Carvalho de Melo [Fri, 10 Sep 2021 15:24:19 +0000 (12:24 -0300)]
perf beauty: Cover more flags in the  move_mount syscall argument beautifier

Previously the regext expected MOVE_MOUNT_[FT]_*, but in the next patch
a flag that doesn't match that expression will be added, MOVE_MOUNT_SET_GROUP

To make this more future proof, take advantage of the fact that the only
one we don't need to cover is MOVE_MOUNT__MASK and use MOVE_MOUNT_[^_]+_*_.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync linux/prctl.h with the kernel sources
Arnaldo Carvalho de Melo [Thu, 11 Feb 2021 15:50:52 +0000 (12:50 -0300)]
tools headers UAPI: Sync linux/prctl.h with the kernel sources

To pick the changes in:

  d8ae933666b29367 ("arm64: mte: change ASYNC and SYNC TCF settings into bitfields")
  1cab2c6357cdf3ad ("x86, prctl: Hook L1D flushing in via prctl")

That don't result in any changes in tooling:

  $ tools/perf/trace/beauty/prctl_option.sh > before
  $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h
  $ tools/perf/trace/beauty/prctl_option.sh > after
  $ diff -u before after
  $

Just silences this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/prctl.h' differs from latest version at 'include/uapi/linux/prctl.h'
  diff -u tools/include/uapi/linux/prctl.h include/uapi/linux/prctl.h

Cc: Balbir Singh <sblbir@amazon.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools include UAPI: Sync sound/asound.h copy with the kernel sources
Arnaldo Carvalho de Melo [Wed, 12 Feb 2020 14:04:23 +0000 (11:04 -0300)]
tools include UAPI: Sync sound/asound.h copy with the kernel sources

Picking the changes from:

  6fb6249932f51e92 ("ALSA: pcm: Add SNDRV_PCM_INFO_EXPLICIT_SYNC flag")

Which entails no changes in the tooling side as it doesn't introduce new
ioctls.

To silence this perf tools build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
  diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h

Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync linux/kvm.h with the kernel sources
Arnaldo Carvalho de Melo [Sun, 9 May 2021 12:39:02 +0000 (09:39 -0300)]
tools headers UAPI: Sync linux/kvm.h with the kernel sources

To pick the changes in:

  7868023b79c39dc2 ("KVM: stats: Support linear and logarithmic histogram statistics")
  0b053623cad6fcaf ("KVM: arm64: Add ioctl to fetch/store tags in a guest")
  ef0072e156e41615 ("KVM: arm64: Introduce MTE VM feature")

That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.

This is also by now used by tools/testing/selftests/kvm/, so that will
pick the new KVM_STATS_TYPE_LINEAR_HIST and KVM_STATS_TYPE_LOG_HIST
defines.

This silences this perf build warning:

  Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
  diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

Cc: Jing Zhang <jingzhangos@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Steven Price <steven.price@arm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agotools headers UAPI: Sync x86's asm/kvm.h with the kernel sources
Arnaldo Carvalho de Melo [Fri, 10 Sep 2021 14:46:54 +0000 (11:46 -0300)]
tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources

To pick the changes in:

  997520b3b92d5708 ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")

That just rebuilds kvm-stat.c on x86, no change in functionality.

This silences these perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
  diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf report: Add support to print a textual representation of IBS raw sample data
Kim Phillips [Tue, 17 Aug 2021 22:15:09 +0000 (17:15 -0500)]
perf report: Add support to print a textual representation of IBS raw sample data

Perf records IBS (Instruction Based Sampling) extra sample data when
'perf record --raw-samples' is used with an IBS-compatible event, on a
machine that supports IBS.  IBS support is indicated in
CPUID_Fn80000001_ECX bit #10.

Up until now, users have been able to see the extra sample data solely
in raw hex format using 'perf report --dump-raw-trace'.  From there,
users could decode the data either manually, or by using an external
script.

Enable the built-in 'perf report --dump-raw-trace' to do the decoding of
the extra sample data bits, so manual or external script decoding isn't
necessary.

Example usage:

  $ sudo perf record -c 10000001 -a --raw-samples -e ibs_fetch/rand_en=1/,ibs_op/cnt_ctl=1/ -C 0,1 taskset -c 0,1 7za b -mmt2 | perf report --dump-raw-trace

Stdout contains IBS Fetch samples, e.g.:

  ibs_fetch_ctl: 02170007ffffffff MaxCnt 1048560 Cnt 1048560 Lat     7 En 1 Val 1 Comp 1 IcMiss 0 PhyAddrValid 1 L1TlbPgSz 4KB L1TlbMiss 0 L2TlbMiss 0 RandEn 1 L2Miss 0
  IbsFetchLinAd: 000056016b2ead40
  IbsFetchPhysAd: 000000115cedfd40
  c_ibs_ext_ctl: 0000000000000000 IbsItlbRefillLat   0

..and IBS Op samples, e.g.:

  ibs_op_ctl: 0000009e009e8968 MaxCnt  10000000 En 1 Val 1 CntCtl 1=uOps CurCnt       158
  IbsOpRip: 000056016b2ea73d
  ibs_op_data: 00000000000b0002 CompToRetCtr     2 TagToRetCtr    11 BrnRet 0  RipInvalid 0 BrnFuse 0 Microcode 0
  ibs_op_data2: 0000000000000002 CacheHitSt 0=M-state RmtNode 0 DataSrc 2=Local node cache
  ibs_op_data3: 0000000000c60002 LdOp 0 StOp 1 DcL1TlbMiss 0 DcL2TlbMiss 0 DcL1TlbHit2M 0 DcL1TlbHit1G 0 DcL2TlbHit2M 0 DcMiss 0 DcMisAcc 0 DcWcMemAcc 0 DcUcMemAcc 0 DcLockedOp 0 DcMissNoMabAlloc 0 DcLinAddrValid 1 DcPhyAddrValid 1 DcL2TlbHit1G 0 L2Miss 0 SwPf 0 OpMemWidth  4 bytes OpDcMissOpenMemReqs  0 DcMissLat     0 TlbRefillLat     0
  IbsDCLinAd: 00007f133c319ce0
  IbsDCPhysAd: 0000000270485ce0

Committer notes:

Fixed up this:

  util/amd-sample-raw.c: In function ‘evlist__amd_sample_raw’:
  util/amd-sample-raw.c:125:42: error: ‘ bytes’ directive output may be truncated writing 6 bytes into a region of size between 4 and 7 [-Werror=format-truncation=]
    125 |                          " OpMemWidth %2d bytes", 1 << (reg.op_mem_width - 1));
        |                                          ^~~~~~
  In file included from /usr/include/stdio.h:866,
                   from util/amd-sample-raw.c:7:
  /usr/include/bits/stdio2.h:71:10: note: ‘__builtin___snprintf_chk’ output between 21 and 24 bytes into a destination of size 21
     71 |   return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
        |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     72 |                                    __glibc_objsize (__s), __fmt,
        |                                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     73 |                                    __va_arg_pack ());
        |                                    ~~~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

As that %2d won't limit the number of chars to 2, just state that 2 is
the minimal width:

  $ cat printf.c
  #include <stdio.h>
  #include <stdlib.h>

  int main(int argc, char *argv[])
  {
   char bf[64];
   int len = snprintf(bf, sizeof(bf), "%2d", atoi(argv[1]));

   printf("strlen(%s): %u\n", bf, len);

   return 0;
  }
  $ ./printf 1
  strlen( 1): 2
  $ ./printf 12
  strlen(12): 2
  $ ./printf 123
  strlen(123): 3
  $ ./printf 1234
  strlen(1234): 4
  $ ./printf 12345
  strlen(12345): 5
  $ ./printf 123456
  strlen(123456): 6
  $

And since we probably don't want that output to be truncated, just
assume the worst case, as the compiler did, and add a few more chars to
that buffer.

Also use sizeof(var) instead of sizeof(dup-of-wanted-format-string) to
avoid bugs when changing one but not the other.

I also had to change this:

  -#include <asm/amd-ibs.h>
  +#include "../../arch/x86/include/asm/amd-ibs.h"

To make it build on other architectures, just like intel-pt does.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https //lore.kernel.org/r/20210817221509.88391-4-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf report: Add tools/arch/x86/include/asm/amd-ibs.h
Kim Phillips [Tue, 17 Aug 2021 22:15:08 +0000 (17:15 -0500)]
perf report: Add tools/arch/x86/include/asm/amd-ibs.h

This is a tools/-side patch for the patch that adds the original copy
of the IBS header file, in arch/x86/include/asm/.

We also add an entry to check-headers.sh, so future changes continue
to be copied.

Committer notes:

Had to add this

  -#include <asm/msr-index.h>
  +#include "msr-index.h"

And change the check-headers.sh entry to ignore this line when diffing
with the original kernel header.

This is needed so that we can use 'perf report' on a perf.data with IBS
data on a !x86 system, i.e. building on ARM fails without this as there
is no asm/msr-index.h there.

This was done on the next patch in this series and is done for things
like Intel PT and ARM CoreSight.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https //lore.kernel.org/r/20210817221509.88391-3-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoMerge tag 'acpi-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 10 Sep 2021 20:29:04 +0000 (13:29 -0700)]
Merge tag 'acpi-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more ACPI updates from Rafael Wysocki:
 "These prevent a confusing PRMT-related message from being printed,
  drop an unnecessary header file include and update the list of ACPICA
  maintainers.

  Specifics:

   - Prevent a message about missing PRMT from being printed on systems
     that do not support PRM, which are the majority now (Aubrey Li).

   - Drop unnecessary header include from scan.c (Kari Argillander).

   - Update the list of ACPICA maintainers after recent departure of one
     of them (Rafael Wysocki)"

* tag 'acpi-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Update the list of maintainers
  ACPI: PRM: Find PRMT table before parsing it
  ACPI: scan: Remove unneeded header linux/nls.h

3 years agoMerge tag 'pm-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 10 Sep 2021 20:20:47 +0000 (13:20 -0700)]
Merge tag 'pm-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These improve hybrid processors support in intel_pstate, fix an issue
  in the core devices PM code, clean up the handling of dedicated wake
  IRQs, update the Energy Model documentation and update MAINTAINERS.

  Specifics:

   - Make the HWP performance levels calibration on hybrid processors in
     intel_pstate more straightforward (Rafael Wysocki).

   - Prevent the PM core from leaving devices in suspend after a failing
     system-wide suspend transition in some cases when driver PM flags
     are used (Prasad Sodagudi).

   - Drop unused function argument from the dedicated wake IRQs handling
     code (Sergey Shtylyov).

   - Fix up Energy Model kerneldoc comments and include them in the
     Energy Model documentation (Lukasz Luba).

   - Use my kernel.org address in MAINTAINERS insead of the personal one
     (Rafael Wysocki)"

* tag 'pm-5.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  MAINTAINERS: Change Rafael's e-mail address
  PM: sleep: core: Avoid setting power.must_resume to false
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

3 years agospi: dt-bindings: xilinx: Drop type reference on *-bits properties
Rob Herring [Fri, 10 Sep 2021 16:59:45 +0000 (11:59 -0500)]
spi: dt-bindings: xilinx: Drop type reference on *-bits properties

Properties with standard unit suffixes such as '-bits' don't need a
type.

Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: linux-spi@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210910165945.2852999-1-robh@kernel.org
3 years agodt-bindings: More use 'enum' instead of 'oneOf' plus 'const' entries
Rob Herring [Fri, 10 Sep 2021 16:51:53 +0000 (11:51 -0500)]
dt-bindings: More use 'enum' instead of 'oneOf' plus 'const' entries

'enum' is equivalent to 'oneOf' with a list of 'const' entries, but 'enum'
is more concise and yields better error messages.

Fix a couple more cases which have appeared.

Cc: Rob Clark <robdclark@gmail.com>
Cc: Sean Paul <sean@poorly.run>
Cc: Mark Brown <broonie@kernel.org>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Marek <jonathan@marek.ca>
Cc: Aswath Govindraju <a-govindraju@ti.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: dri-devel@lists.freedesktop.org
Cc: freedreno@lists.freedesktop.org
Cc: linux-spi@vger.kernel.org
Cc: linux-watchdog@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210910165153.2843871-1-robh@kernel.org
3 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 10 Sep 2021 18:58:20 +0000 (11:58 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Limit the linear region to 51-bit when KVM is running in nVHE mode.

   Otherwise, depending on the placement of the ID map, kernel-VA to
   hyp-VA translations may produce addresses that either conflict with
   other HYP mappings or generate addresses outside of the 52-bit
   addressable range.

 - Instruct kmemleak not to scan the memory reserved for kdump as this
   range is removed from the kernel linear map and therefore not
   accessible.

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: kdump: Skip kmemleak scan reserved memory for kdump
  arm64: mm: limit linear region to 51 bits for KVM in nVHE mode

3 years agoMerge tag 'for-5.15/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Fri, 10 Sep 2021 18:52:01 +0000 (11:52 -0700)]
Merge tag 'for-5.15/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:

 - Build warning fixes in Makefile and Dino PCI driver

 - Fix when sched_clock is marked unstable

 - Drop strnlen_user() in favour of generic version

 - Prevent kernel to write outside userspace signal stack

 - Remove CONFIG_SET_FS including KERNEL_DS and USER_DS from parisc and
   switch to __get/put_kernel_nofault()

* tag 'for-5.15/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Implement __get/put_kernel_nofault()
  parisc: Mark sched_clock unstable only if clocks are not syncronized
  parisc: Move pci_dev_is_behind_card_dino to where it is used
  parisc: Reduce sigreturn trampoline to 3 instructions
  parisc: Check user signal stack trampoline is inside TASK_SIZE
  parisc: Drop useless debug info and comments from signal.c
  parisc: Drop strnlen_user() in favour of generic version
  parisc: Add missing FORCE prerequisite in Makefile

3 years agoMerge tag 'iommu-fixes-v5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 10 Sep 2021 18:42:03 +0000 (11:42 -0700)]
Merge tag 'iommu-fixes-v5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d:
     - PASID leakage in intel_svm_unbind_mm()
     - Deadlock in intel_svm_drain_prq()

 - AMD IOMMU: Fixes for an unhandled page-fault bug when AVIC is used
   for a KVM guest.

 - Make CONFIG_IOMMU_DEFAULT_DMA_LAZY architecture instead of IOMMU
   driver dependent

* tag 'iommu-fixes-v5.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu: Clarify default domain Kconfig
  iommu/vt-d: Fix a deadlock in intel_svm_drain_prq()
  iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm()
  iommu/amd: Remove iommu_init_ga()
  iommu/amd: Relocate GAMSup check to early_enable_iommus

3 years agoMerge tag 'char-misc-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 10 Sep 2021 18:31:47 +0000 (11:31 -0700)]
Merge tag 'char-misc-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull habanalabs updates from Greg KH:
 "Here is another round of misc driver patches for 5.15-rc1.

  In here is only updates for the Habanalabs driver. This request is
  late because the previously-objected-to dma-buf patches are all
  removed and some fixes that you and others found are now included in
  here as well.

  All of these have been in linux-next for well over a week with no
  reports of problems, and they are all self-contained to only this one
  driver. Full details are in the shortlog"

* tag 'char-misc-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (61 commits)
  habanalabs/gaudi: hwmon default card name
  habanalabs: add support for f/w reset
  habanalabs/gaudi: block ICACHE_BASE_ADDERESS_HIGH in TPC
  habanalabs: cannot sleep while holding spinlock
  habanalabs: never copy_from_user inside spinlock
  habanalabs: remove unnecessary device status check
  habanalabs: disable IRQ in user interrupts spinlock
  habanalabs: add "in device creation" status
  habanalabs/gaudi: invalidate PMMU mem cache on init
  habanalabs/gaudi: size should be printed in decimal
  habanalabs/gaudi: define DC POWER for secured PMC
  habanalabs/gaudi: unmask out of bounds SLM access interrupt
  habanalabs: add userptr_lookup node in debugfs
  habanalabs/gaudi: fetch TPC/MME ECC errors from F/W
  habanalabs: modify multi-CS to wait on stream masters
  habanalabs/gaudi: add monitored SOBs to state dump
  habanalabs/gaudi: restore user registers when context opens
  habanalabs/gaudi: increase boot fit timeout
  habanalabs: update to latest firmware headers
  habanalabs/gaudi: minimize number of register reads
  ...

3 years agoMerge branches 'acpi-scan' and 'acpi-prm'
Rafael J. Wysocki [Fri, 10 Sep 2021 18:27:07 +0000 (20:27 +0200)]
Merge branches 'acpi-scan' and 'acpi-prm'

* acpi-scan:
  ACPI: scan: Remove unneeded header linux/nls.h

* acpi-prm:
  ACPI: PRM: Find PRMT table before parsing it

3 years agoMerge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'
Rafael J. Wysocki [Fri, 10 Sep 2021 18:26:08 +0000 (20:26 +0200)]
Merge branches 'pm-cpufreq', 'pm-sleep' and 'pm-em'

* pm-cpufreq:
  cpufreq: intel_pstate: hybrid: Rework HWP calibration
  ACPI: CPPC: Introduce cppc_get_nominal_perf()

* pm-sleep:
  PM: sleep: core: Avoid setting power.must_resume to false
  PM: sleep: wakeirq: drop useless parameter from dev_pm_attach_wake_irq()

* pm-em:
  Documentation: power: include kernel-doc in Energy Model doc
  PM: EM: fix kernel-doc comments

3 years agoMerge tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 10 Sep 2021 18:22:23 +0000 (11:22 -0700)]
Merge tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Just an initial bunch of fixes for the merge window, amdgpu is most of
  them with a few ttm fixes and an fbdev avoid multiply overflow fix.

  core:
   - Make some dma-buf config options depend on DMA_SHARED_BUFFER
   - Handle multiplication overflow of fbdev xres/yres in the core

  ttm:
   - Fix ttm_bo_move_memcpy() when ttm_resource is subclassed
   - Fix ttm deadlock if target BO isn't idle
   - ttm build fix
   - ttm docs fix

  dma-buf:
   - config option fixes

  fbdev:
   - limit resolutions to avoid int overflow

  i915:
   - stddef change.

  amdgpu:
   - Misc cleanups, typo fixes
   - EEPROM fix
   - Add some new PCI IDs
   - Scatter/Gather display support for Yellow Carp
   - PCIe DPM fix for RKL platforms
   - RAS fix

  amdkfd:
   - SVM fix

  vc4:
   - static function fix

  mgag200:
   - fix uninit var

  panfrost:
   - lock_region fixes"

* tag 'drm-next-2021-09-10' of git://anongit.freedesktop.org/drm/drm: (36 commits)
  drm/ttm: Fix a deadlock if the target BO is not idle during swap
  fbmem: don't allow too huge resolutions
  dma-buf: DMABUF_SYSFS_STATS should depend on DMA_SHARED_BUFFER
  dma-buf: DMABUF_DEBUG should depend on DMA_SHARED_BUFFER
  drm/i915: use linux/stddef.h due to "isystem: trim/fixup stdarg.h and other headers"
  dma-buf: DMABUF_MOVE_NOTIFY should depend on DMA_SHARED_BUFFER
  drm/amdkfd: drop process ref count when xnack disable
  drm/amdgpu: enable more pm sysfs under SRIOV 1-VF mode
  drm/amdgpu: fix fdinfo race with process exit
  drm/amdgpu: Fix a deadlock if previous GEM object allocation fails
  drm/amdgpu: stop scheduler when calling hw_fini (v2)
  drm/amdgpu: Clear RAS interrupt status on aldebaran
  drm/amd/display: Initialize lt_settings on instantiation
  drm/amd/display: cleanup idents after a revert
  drm/amd/display: Fix memory leak reported by coverity
  drm/ttm: Fix ttm_bo_move_memcpy() for subclassed struct ttm_resource
  drm/amdgpu/swsmu: fix spelling mistake "minimun" -> "minimum"
  drm/amdgpu: Disable PCIE_DPM on Intel RKL Platform
  drm/amdgpu: show both cmd id and name when psp cmd failed
  drm/amd/display: setup system context for APUs
  ...

3 years agofsnotify: fix sb_connectors leak
Amir Goldstein [Thu, 9 Sep 2021 11:56:34 +0000 (14:56 +0300)]
fsnotify: fix sb_connectors leak

Fix a leak in s_fsnotify_connectors counter in case of a race between
concurrent add of new fsnotify mark to an object.

The task that lost the race fails to drop the counter before freeing
the unused connector.

Following umount() hangs in fsnotify_sb_delete()/wait_var_event(),
because s_fsnotify_connectors never drops to zero.

Fixes: fe1d2cba3fc5 ("fsnotify: count all objects with attached connectors")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/linux-fsdevel/20210907063338.ycaw6wvhzrfsfdlp@xzhoux.usersys.redhat.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3 years agoof: property: Disable fw_devlink DT support for X86
Saravana Kannan [Fri, 10 Sep 2021 01:14:45 +0000 (18:14 -0700)]
of: property: Disable fw_devlink DT support for X86

Andre reported fw_devlink=on breaking OLPC XO-1.5 [1].

OLPC XO-1.5 is an X86 system that uses a mix of ACPI and OF to populate
devices. The root cause seems to be ISA devices not setting their fwnode
field. But trying to figure out how to fix that doesn't seem worth the
trouble because the OLPC devicetree is very sparse/limited and fw_devlink
only adds the links causing this issue. Considering that there aren't many
users of OF in an X86 system, simply fw_devlink DT support for X86.

[1] - https://lore.kernel.org/lkml/3c1f2473-92ad-bfc4-258e-a5a08ad73dd0@web.de/

Fixes: 79e4b42bc148 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: Saravana Kannan <saravanak@google.com>
Cc: Andre Muller <andre.muller@web.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Andre Müller <andre.muller@web.de>
Link: https://lore.kernel.org/r/20210910011446.3208894-1-saravanak@google.com
Signed-off-by: Rob Herring <robh@kernel.org>
3 years agolkdtm: Use init_uts_ns.name instead of macros
Kees Cook [Wed, 1 Sep 2021 23:34:06 +0000 (16:34 -0700)]
lkdtm: Use init_uts_ns.name instead of macros

Using generated/compile.h triggered a full LKDTM rebuild with every
build. Avoid this by using the exported strings instead.

Fixes: 798f070b36b7 ("lkdtm: Add kernel version to failure hints")
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210901233406.2571643-1-keescook@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3 years agoperf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings
Kim Phillips [Tue, 17 Aug 2021 22:15:07 +0000 (17:15 -0500)]
perf env: Add perf_env__cpuid, perf_env__{nr_}pmu_mappings

To be used by IBS raw data display: It needs the recorder's cpuid in
order to determine which errata workarounds to apply to the data, and
the pmu_mappings are needed in order to figure out which PMU sample
type is IBS Fetch vs. IBS Op.

When not available from perf.data, we assume local operation, and
retrieve cpuid and pmu mappings directly from the running system.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https //lore.kernel.org/r/20210817221509.88391-2-kim.phillips@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
3 years agoperf symbol: Look for ImageBase in PE file to compute .text offset
Remi Bernon [Thu, 9 Sep 2021 19:26:36 +0000 (21:26 +0200)]
perf symbol: Look for ImageBase in PE file to compute .text offset

Instead of using the file offset in the debug file.

This fixes a regression from a8298b2fb1fc1d25 ("perf symbols: Make
dso__load_bfd_symbols() load PE files from debug cache only"), causing
incorrect symbol resolution when debug file have been stripped from
non-debug sections (in which case its .text section is empty and doesn't
have any file position).

The debug files could also be created with a different file alignment,
and have different file positions from the mmap-ed binary, or have the
section reordered.

This instead looks for the file image base, using the corresponding bfd
*ABS* symbols. As PE symbols only have 4 bytes, it also needs to keep
.text section vma high bits.

Signed-off-by: Remi Bernon <rbernon@codeweavers.com>
Fixes: a8298b2fb1fc1d25 ("perf symbols: Make dso__load_bfd_symbols() load PE files from debug cache only")
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Nicholas Fraser <nfraser@codeweavers.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210909192637.4139125-1-rbernon@codeweavers.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>