git.baikalelectronics.ru Git

nilfs2: fix shift-out-of-bounds due to too large exponent of block size

[ Upstream commit cfb76f8216e4b01ca793b2f17bb362cc3949a305 ]

If field s_log_block_size of superblock data is corrupted and too large,
init_nilfs() and load_nilfs() still can trigger a shift-out-of-bounds
warning followed by a kernel panic (if panic_on_warn is set):

shift exponent 38973 is too large for 32-bit type 'int'
Call Trace:
  <TASK>
  dump_stack_lvl+0xcd/0x134
  ubsan_epilogue+0xb/0x50
  __ubsan_handle_shift_out_of_bounds.cold.12+0x17b/0x1f5
  init_nilfs.cold.11+0x18/0x1d [nilfs2]
  nilfs_mount+0x9b5/0x12b0 [nilfs2]
  ...

This fixes the issue by adding and using a new helper function for getting
block size with sanity check.

Link: https://lkml.kernel.org/r/20221027044306.42774-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

nilfs2: fix shift-out-of-bounds/overflow in nilfs_sb2_bad_offset()

[ Upstream commit 2f4ab65fefb25a5d1ef24a8790672a77e3c92937 ]

Patch series "nilfs2: fix UBSAN shift-out-of-bounds warnings on mount
time".

The first patch fixes a bug reported by syzbot, and the second one fixes
the remaining bug of the same kind.  Although they are triggered by the
same super block data anomaly, I divided it into the above two because the
details of the issues and how to fix it are different.

Both are required to eliminate the shift-out-of-bounds issues at mount
time.

This patch (of 2):

If the block size exponent information written in an on-disk superblock is
corrupted, nilfs_sb2_bad_offset helper function can trigger
shift-out-of-bounds warning followed by a kernel panic (if panic_on_warn
is set):

shift exponent 38983 is too large for 64-bit type 'unsigned long long'
Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
  ubsan_epilogue lib/ubsan.c:151 [inline]
  __ubsan_handle_shift_out_of_bounds+0x33d/0x3b0 lib/ubsan.c:322
  nilfs_sb2_bad_offset fs/nilfs2/the_nilfs.c:449 [inline]
  nilfs_load_super_block+0xdf5/0xe00 fs/nilfs2/the_nilfs.c:523
  init_nilfs+0xb7/0x7d0 fs/nilfs2/the_nilfs.c:577
  nilfs_fill_super+0xb1/0x5d0 fs/nilfs2/super.c:1047
  nilfs_mount+0x613/0x9b0 fs/nilfs2/super.c:1317
  ...

In addition, since nilfs_sb2_bad_offset() performs multiplication without
considering the upper bound, the computation may overflow if the disk
layout parameters are not normal.

This fixes these issues by inserting preliminary sanity checks for those
parameters and by converting the comparison from one involving
multiplication and left bit-shifting to one using division and right
bit-shifting.

Link: https://lkml.kernel.org/r/20221027044306.42774-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20221027044306.42774-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+e91619dd4c11c4960706@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: video: Add force_native quirk for Sony Vaio VPCY11S1E

[ Upstream commit 123d19a9214d7c171949439840f833a1e5e4222f ]

The Sony Vaio VPCY11S1E advertises both native and ACPI video backlight
control interfaces, but only the native interface works and the default
heuristics end up picking ACPI video on this model.

Add a video_detect_force_native DMI quirk for this.

Reported-by: Stefan Joosten <stefan@atcomputing.nl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: video: Add force_vendor quirk for Sony Vaio PCG-FRV35

[ Upstream commit 8c45d7476b10864fd383988500f22c798afd5e9a ]

The Sony Vaio PCG-FRV35 advertises both native and vendor backlight
control interfaces. With the upcoming changes to prefer native over
vendor acpi_video_get_backlight_type() will start returning native on
these laptops.

But the native radeon_bl0 interface does not work, where as the sony
vendor interface does work. Add a quirk to force use of the vendor
interface.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: video: Change Sony Vaio VPCEH3U1E quirk to force_native

[ Upstream commit 3cb96c6ac2ecaa5c04e5b0c2fcf4617c2fa2b094 ]

According to: https://bugzilla.kernel.org/show_bug.cgi?id=202401
the Sony Vaio VPCEH3U1E quirk was added to disable the acpi_video0
backlight interface because that was not working, so that userspace
will pick the actually working native nv_backlight interface instead.

With the new kernel behavior of hiding native interfaces unless
acpi_video_get_backlight_type() returns native, the current
video_detect_force_vendor quirk will cause the working nv_backlight
interface will be disabled too.

Change the quirk to video_detect_force_native to get the desired
result of only registering the nv_backlight interface.

After this all currently remaining force_vendor quirks in
video_detect_dmi_table[] are there to prefer a vendor interface over
a non working ACPI video interface, add a comment to document this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: video: Change GIGABYTE GB-BXBT-2807 quirk to force_none

[ Upstream commit 80d9a00fc5db0f9579cf161523b4ec70c899ce07 ]

The GIGABYTE GB-BXBT-2807 DMI quirk was added by
commit 7dabcbf0b409 ("ACPI: video: Add DMI quirk for GIGABYTE
GB-BXBT-2807") which says the following in its commit message:

"The GIGABYTE GB-BXBT-2807 is a mini-PC which uses off the shelf
components, like an Intel GPU which is meant for mobile systems.
As such, it, by default, has a backlight controller exposed.

Unfortunately, the backlight controller only confuses userspace, which
sees the existence of a backlight device node and has the unrealistic
belief that there is actually a backlight there!

Add a DMI quirk to force the backlight off on this system."

So in essence this quirk was using a video_detect_force_vendor quirk
to disable backlight control. Now a days we have a specific "none"
backlight type for this. Change the quirk to video_detect_force_none
and group it together with the other force_none quirks.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp

[ Upstream commit 239b7117073315944ac7df1fdb23625fa6f9c7c4 ]

Calls to thermal_zone_get_temp() are not protected against thermal zone
device removal. As result, it is possible that the thermal zone operations
callbacks are no longer valid when thermal_zone_get_temp() is called.
This may result in crashes such as

BUG: unable to handle page fault for address: ffffffffc04ef420
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 3209 Comm: cat Tainted: G        W         5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1
Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018
RIP: 0010:thermal_zone_get_temp+0x26/0x73
Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f
RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287
RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000
RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000
RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000
R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000
R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018
FS:  00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0
Call Trace:
temp_show+0x31/0x68
dev_attr_show+0x1d/0x4f
sysfs_kf_seq_show+0x92/0x107
seq_read_iter+0xf5/0x3f2
vfs_read+0x205/0x379
__x64_sys_read+0x7c/0xe2
do_syscall_64+0x43/0x55
entry_SYSCALL_64_after_hwframe+0x61/0xc6

if a thermal device is removed while accesses to its device attributes
are ongoing.

The problem is exposed by code in iwl_op_mode_mvm_start(), which registers
a thermal zone device only to unregister it shortly afterwards if an
unrelated failure is encountered while accessing the hardware.

Check if the thermal zone device is registered after acquiring the
thermal zone device mutex to ensure this does not happen.

The code was tested by triggering the failure in iwl_op_mode_mvm_start()
on purpose. Without this patch, the kernel crashes reliably. The crash
is no longer observed after applying this and the preceding patches.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPICA: Fix error code path in acpi_ds_call_control_method()

[ Upstream commit db293704d9576383acb153dec6cca153df2e0fea ]

A use-after-free in acpi_ps_parse_aml() after a failing invocaion of
acpi_ds_call_control_method() is reported by KASAN [1] and code
inspection reveals that next_walk_state pushed to the thread by
acpi_ds_create_walk_state() is freed on errors, but it is not popped
from the thread beforehand. Thus acpi_ds_get_current_walk_state()
called by acpi_ps_parse_aml() subsequently returns it as the new
walk state which is incorrect.

To address this, make acpi_ds_call_control_method() call
acpi_ds_pop_walk_state() to pop next_walk_state from the thread before
returning an error.

Link: https://lore.kernel.org/linux-acpi/20221019073443.248215-1-chenzhongjin@huawei.com/
Reported-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Chen Zhongjin <chenzhongjin@huawei.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: EC: Add quirk for the HP Pavilion Gaming 15-cx0041ur

[ Upstream commit 1d72ae73eff6d2b85b4ff136f4367c3ebcd539f2 ]

Added GPE quirk entry for the HP Pavilion Gaming 15-cx0041ur.
There is a quirk entry for the 15-cx0xxx laptops, but this one has
different DMI_PRODUCT_NAME.

Notably backlight keys and other ACPI events now function correctly.

Signed-off-by: Mia Kanashi <chad@redpilled.dev>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ACPI: processor: idle: Check acpi_fetch_acpi_dev() return value

[ Upstream commit 24978940ee4de4492af2592dde6bf4da4ffdb27e ]

The return value of acpi_fetch_acpi_dev() could be NULL, which would
cause a NULL pointer dereference to occur in acpi_device_hid().

Signed-off-by: Li Zhong <floridsleeves@gmail.com>
[ rjw: Subject and changelog edits, added empty line after if () ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: jfs: fix shift-out-of-bounds in dbDiscardAG

[ Upstream commit 2305913893202015cbc1d46f945e9260b7e21a67 ]

This should be applied to most URSAN bugs found recently by syzbot,
by guarding the dbMount. As syzbot feeding rubbish into the bmap
descriptor.

Signed-off-by: Hoi Pok Wu <wuhoipok@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

jfs: Fix fortify moan in symlink

[ Upstream commit 2131d82ec1cb10bced884a8fe4911a69f3aae17e ]

JFS has in jfs_incore.h:

      /* _inline may overflow into _inline_ea when needed */
      /* _inline_ea may overlay the last part of
       * file._xtroot if maxentry = XTROOTINITSLOT
       */
      union {
        struct {
          /* 128: inline symlink */
          unchar _inline[128];
          /* 128: inline extended attr */
          unchar _inline_ea[128];
        };
        unchar _inline_all[256];

and currently the symlink code copies into _inline;
if this is larger than 128 bytes it triggers a fortify warning of the
form:

  memcpy: detected field-spanning write (size 132) of single field
     "ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615)

when it's actually OK.

Copy it into _inline_all instead.

Reported-by: syzbot+5fc38b2ddbbca7f5c680@syzkaller.appspotmail.com
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

udf: Avoid double brelse() in udf_rename()

[ Upstream commit 1fd9f321cf1a2472d904363e41f3b538322d3408 ]

syzbot reported a warning like below [1]:

VFS: brelse: Trying to free free buffer
WARNING: CPU: 2 PID: 7301 at fs/buffer.c:1145 __brelse+0x67/0xa0
...
Call Trace:
<TASK>
invalidate_bh_lru+0x99/0x150
smp_call_function_many_cond+0xe2a/0x10c0
? generic_remap_file_range_prep+0x50/0x50
? __brelse+0xa0/0xa0
? __mutex_lock+0x21c/0x12d0
? smp_call_on_cpu+0x250/0x250
? rcu_read_lock_sched_held+0xb/0x60
? lock_release+0x587/0x810
? __brelse+0xa0/0xa0
? generic_remap_file_range_prep+0x50/0x50
on_each_cpu_cond_mask+0x3c/0x80
blkdev_flush_mapping+0x13a/0x2f0
blkdev_put_whole+0xd3/0xf0
blkdev_put+0x222/0x760
deactivate_locked_super+0x96/0x160
deactivate_super+0xda/0x100
cleanup_mnt+0x222/0x3d0
task_work_run+0x149/0x240
? task_work_cancel+0x30/0x30
do_exit+0xb29/0x2a40
? reacquire_held_locks+0x4a0/0x4a0
? do_raw_spin_lock+0x12a/0x2b0
? mm_update_next_owner+0x7c0/0x7c0
? rwlock_bug.part.0+0x90/0x90
? zap_other_threads+0x234/0x2d0
do_group_exit+0xd0/0x2a0
__x64_sys_exit_group+0x3a/0x50
do_syscall_64+0x34/0xb0
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The cause of the issue is that brelse() is called on both ofibh.sbh
and ofibh.ebh by udf_find_entry() when it returns NULL. However,
brelse() is called by udf_rename(), too. So, b_count on buffer_head
becomes unbalanced.

This patch fixes the issue by not calling brelse() by udf_rename()
when udf_find_entry() returns NULL.

Link: https://syzkaller.appspot.com/bug?id=8297f45698159c6bca8a1f87dc983667c1a1c851
Reported-by: syzbot+7902cd7684bc35306224@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221023095741.271430-1-syoshida@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

fs: jfs: fix shift-out-of-bounds in dbAllocAG

[ Upstream commit dc084389a58416d26d026f552eaec2f4979cd7e8 ]

Syzbot found a crash : UBSAN: shift-out-of-bounds in dbAllocAG. The
underlying bug is the missing check of bmp->db_agl2size. The field can
be greater than 64 and trigger the shift-out-of-bounds.

Fix this bug by adding a check of bmp->db_agl2size in dbMount since this
field is used in many following functions. The upper bound for this
field is L2MAXL2SIZE - L2MAXAG, thanks for the help of Dave Kleikamp.
Note that, for maintenance, I reorganized error handling code of dbMount.

Reported-by: syzbot+15342c1aa6a00fb7a438@syzkaller.appspotmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: qcom: sm6350: Add apps_smmu with streamID to SDHCI 1/2 nodes

[ Upstream commit 4a4e922af11f97f6d945769322b6157bf17d03e3 ]

When enabling the APPS SMMU the mainline driver reconfigures the SMMU
from its bootloader configuration, losing the stream mapping for (among
which) the SDHCI hardware and breaking its ADMA feature. This feature
can be disabled with:

sdhci.debug_quirks=0x40

But it is of course desired to have this feature enabled and working
through the SMMU.

Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Reviewed-by: Luca Weiss <luca.weiss@fairphone.com>
Tested-by: Luca Weiss <luca.weiss@fairphone.com> # sm7225-fairphone-fp4
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221030073232.22726-11-marijn.suijten@somainline.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: qcom: sm8450: disable SDHCI SDR104/SDR50 on all boards

[ Upstream commit 38e903a6657b57f30de0925fde9be435f8d4e672 ]

SDHCI on SM8450 HDK also has problems with SDR104/SDR50:

mmc0: card never left busy state
mmc0: error -110 whilst initialising SD card

so I think it is safe to assume this issue affects all SM8450 boards.
Move the quirk disallowing these modes to the SoC DTSI, to spare people
working on other boards the misery of debugging this issue.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221026200357.391635-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

binfmt_misc: fix shift-out-of-bounds in check_special_flags

[ Upstream commit 4bf0c70cf494911936b2d5fa6963392c78ced752 ]

UBSAN reported a shift-out-of-bounds warning:

left shift of 1 by 31 places cannot be represented in type 'int'
Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
  ubsan_epilogue+0xa/0x44 lib/ubsan.c:151
  __ubsan_handle_shift_out_of_bounds+0x1e7/0x208 lib/ubsan.c:322
  check_special_flags fs/binfmt_misc.c:241 [inline]
  create_entry fs/binfmt_misc.c:456 [inline]
  bm_register_write+0x9d3/0xa20 fs/binfmt_misc.c:654
  vfs_write+0x11e/0x580 fs/read_write.c:582
  ksys_write+0xcf/0x120 fs/read_write.c:637
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x34/0x80 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x4194e1

Since the type of Node's flags is unsigned long, we should define these
macros with same type too.

Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102025123.1117184-1-liushixin2@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/hyperv: Remove unregister syscore call from Hyper-V cleanup

[ Upstream commit 64435fc18561853c9ac9ccde6ed9108170b62187 ]

Hyper-V cleanup code comes under panic path where preemption and irq
is already disabled. So calling of unregister_syscore_ops might schedule
out the thread even for the case where mutex lock is free.
hyperv_cleanup
unregister_syscore_ops
mutex_lock(&syscore_ops_lock)
might_sleep
Here might_sleep might schedule out this thread, where voluntary preemption
config is on and this thread will never comes back. And also this was added
earlier to maintain the symmetry which is not required as this can comes
during crash shutdown path only.

To prevent the same, removing unregister_syscore_ops function call.

Signed-off-by: Gaurav Kohli <gauravkohli@linux.microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1669443291-2575-1-git-send-email-gauravkohli@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

video: hyperv_fb: Avoid taking busy spinlock on panic path

[ Upstream commit a035c69578f85512e71461f8c271f19d4751576a ]

The Hyper-V framebuffer code registers a panic notifier in order
to try updating its fbdev if the kernel crashed. The notifier
callback is straightforward, but it calls the vmbus_sendpacket()
routine eventually, and such function takes a spinlock for the
ring buffer operations.

Panic path runs in atomic context, with local interrupts and
preemption disabled, and all secondary CPUs shutdown. That said,
taking a spinlock might cause a lockup if a secondary CPU was
disabled with such lock taken. Fix it here by checking if the
ring buffer spinlock is busy on Hyper-V framebuffer panic notifier;
if so, bail-out avoiding the potential lockup scenario.

Cc: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Tianyu Lan <Tianyu.Lan@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Tested-by: Fabio A M Martins <fabiomirmar@gmail.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220819221731.480795-10-gpiccoli@igalia.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

ARM: dts: aspeed: rainier,everest: Move reserved memory regions

[ Upstream commit 428b7fcaeec214cfa2e052416ca651114630ef2d ]

Move the reserved regions to account for a decrease in DRAM when ECC is
enabled. ECC takes 1/9th of memory.

Running on HW with ECC off, u-boot prints:
DRAM: already initialized, 1008 MiB (capacity:1024 MiB, VGA:16 MiB, ECC:off)

And with ECC on, u-boot prints:
DRAM: already initialized, 896 MiB (capacity:1024 MiB, VGA:16 MiB, ECC:on, ECC size:896 MiB)

This implies that MCR54 is configured for ECC to be bounded at the
bottom of a 16MiB VGA memory region:

1024MiB - 16MiB (VGA) = 1008MiB
1008MiB / 9 (for ECC) = 112MiB
1008MiB - 112MiB = 896MiB (available DRAM)

The flash_memory region currently starts at offset 896MiB:
0xb8000000 (flash_memory offset) - 0x80000000 (base memory address) = 0x38000000 = 896MiB

This is the end of the available DRAM with ECC enabled and therefore it
needs to be moved.

Since the flash_memory is 64MiB in size and needs to be 64MiB aligned,
it can just be moved up by 64MiB and would sit right at the end of the
available DRAM buffer.

The ramoops region currently follows the flash_memory, but it can be
moved to sit above flash_memory which would minimize the address-space
fragmentation.

Signed-off-by: Adriana Kobylak <anoo@us.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Link: https://lore.kernel.org/r/20220916195535.1020185-1-anoo@linux.ibm.com
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: make is_ttbrX_addr() noinstr-safe

[ Upstream commit 54626d6a928d8e508f7071234a259ec18d3ecd22 ]

We use is_ttbr0_addr() in noinstr code, but as it's only marked as
inline, it's theoretically possible for the compiler to place it
out-of-line and instrument it, which would be problematic.

Mark is_ttbr0_addr() as __always_inline such that that can safely be
used from noinstr code. For consistency, do the same to is_ttbr1_addr().
Note that while is_ttbr1_addr() calls arch_kasan_reset_tag(), this is a
macro (and its callees are either macros or __always_inline), so there
is not a risk of transient instrumentation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221114144042.3001140-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()

[ Upstream commit c508ace9b0ebd45893ea7fe54a30613d62ddaddc ]

Running rcutorture with non-zero fqs_duration module parameter in a
kernel built with CONFIG_PREEMPTION=y results in the following splat:

BUG: using __this_cpu_read() in preemptible [00000000]
code: rcu_torture_fqs/398
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 398 Comm: rcu_torture_fqs Not tainted 6.0.0-rc1-yoctodev-standard+
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x86
dump_stack+0x10/0x16
check_preemption_disabled+0xe5/0xf0
__this_cpu_preempt_check+0x13/0x20
rcu_force_quiescent_state.part.0+0x1c/0x170
rcu_force_quiescent_state+0x1e/0x30
rcu_torture_fqs+0xca/0x160
? rcu_torture_boost+0x430/0x430
kthread+0x192/0x1d0
? kthread_complete_and_exit+0x30/0x30
ret_from_fork+0x22/0x30
</TASK>

The problem is that rcu_force_quiescent_state() uses __this_cpu_read()
in preemptible code instead of the proper raw_cpu_read(). This commit
therefore changes __this_cpu_read() to raw_cpu_read().

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: fec: check the return value of build_skb()

[ Upstream commit 05b4f270051a4e00fd720e40bae310b10fdfbd02 ]

The build_skb might return a null pointer but there is no check on the
return value in the fec_enet_rx_queue(). So a null pointer dereference
might occur. To avoid this, we check the return value of build_skb. If
the return value is a null pointer, the driver will recycle the page and
update the statistic of ndev. Then jump to rx_processing_done to clear
the status flags of the BD so that the hardware can recycle the BD.

Fixes: 377f42a03386 ("net: fec: using page pool to manage RX buffers")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20221219022755.1047573-1-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

HID: amd_sfh: Add missing check for dma_alloc_coherent

[ Upstream commit d3888c3b121098ab88d41c0d5715a122c12fafd8 ]

Add check for the return value of the dma_alloc_coherent since
it may return NULL pointer if allocation fails.

Fixes: e3cc2aa00c0c ("SFH:Transport Driver to add support of AMD Sensor Fusion Hub (SFH)")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Acked-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20221220024921.21992-1-jiasheng@iscas.ac.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>

mctp: Remove device type check at unregister

[ Upstream commit 253a3aaed41fbb7bfa70e68e0feb9266e05ff512 ]

The unregister check could be incorrectly triggered if a netdev
changes its type after register. That is possible for a tun device
using TUNSETLINK ioctl, resulting in mctp unregister failing
and the netdev unregister waiting forever.

This was encountered by https://github.com/openthread/openthread/issues/8523

Neither check at register or unregister is required. They were added in
an attempt to track down mctp_ptr being set unexpectedly, which should
not happen in normal operation.

Fixes: d9d7f5f8dccf ("mctp: Warn if pointer is set for a wrong dev type")
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au>
Link: https://lore.kernel.org/r/20221215054933.2403401-1-matt@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq

[ Upstream commit a2c336b59ccec01b598ebdc9ff960191458cdc66 ]

KSZ swithes used interrupts for detecting the phy link up and down.
During registering the interrupt handler, it used IRQF_TRIGGER_FALLING
flag. But this flag has to be retrieved from device tree instead of hard
coding in the driver, so removing the flag.

Fixes: 2169f48a4579 ("net: dsa: microchip: move interrupt handling logic from lan937x to ksz_common")
Reported-by: Christian Eggers <ceggers@arri.de>
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Link: https://lore.kernel.org/r/20221213101440.24667-1-arun.ramadoss@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

cifs: don't leak -ENOMEM in smb2_open_file()

[ Upstream commit 8b5a71257ea9478f4fc27679ecf706b0be0effa5 ]

A NULL error response might be a valid case where smb2_reconnect()
failed to reconnect the session and tcon due to a disconnected server
prior to issuing the I/O operation, so don't leak -ENOMEM to userspace
on such occasions.

Fixes: 2f42339dd042 ("cifs: improve symlink handling for smb2+")
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mctp: serial: Fix starting value for frame check sequence

[ Upstream commit 775a00ec642f7ae9b4655ea67496647cb054684e ]

RFC1662 defines the start state for the crc16 FCS to be 0xffff, but
we're currently starting at zero.

This change uses the correct start state. We're only early in the
adoption for the serial binding, so there aren't yet any other users to
interface to.

Fixes: 787c61be4378 ("mctp: Add MCTP-over-serial transport binding")
Reported-by: Harsh Tyagi <harshtya@google.com>
Tested-by: Harsh Tyagi <harshtya@google.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: stream: purge sk_error_queue in sk_stream_kill_queues()

[ Upstream commit 6ea88d67f035e37152a5c2317a665d2b0c5c2045 ]

Changheon Lee reported TCP socket leaks, with a nice repro.

It seems we leak TCP sockets with the following sequence:

1) SOF_TIMESTAMPING_TX_ACK is enabled on the socket.

   Each ACK will cook an skb put in error queue, from __skb_tstamp_tx().
   __skb_tstamp_tx() is using skb_clone(), unless
   SOF_TIMESTAMPING_OPT_TSONLY was also requested.

2) If the application is also using MSG_ZEROCOPY, then we put in the
   error queue cloned skbs that had a struct ubuf_info attached to them.

   Whenever an struct ubuf_info is allocated, sock_zerocopy_alloc()
   does a sock_hold().

   As long as the cloned skbs are still in sk_error_queue,
   socket refcount is kept elevated.

3) Application closes the socket, while error queue is not empty.

Since tcp_close() no longer purges the socket error queue,
we might end up with a TCP socket with at least one skb in
error queue keeping the socket alive forever.

This bug can be (ab)used to consume all kernel memory
and freeze the host.

We need to purge the error queue, with proper synchronization
against concurrent writers.

Fixes: 2d3e867815a5 ("net: stream: don't purge sk_error_queue in sk_stream_kill_queues()")
Reported-by: Changheon Lee <darklight2357@icloud.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

myri10ge: Fix an error handling path in myri10ge_probe()

[ Upstream commit 6dd1cac10c82c410ffe97e96d158d63760881ce4 ]

Some memory allocated in myri10ge_probe_slices() is not released in the
error handling path of myri10ge_probe().

Add the corresponding kfree(), as already done in the remove function.

Fixes: 8fdfa5a90e64 ("myri10ge: add multislices support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rxrpc: Fix missing unlock in rxrpc_do_sendmsg()

[ Upstream commit f9fa29b8e9a62e7ad4d0a277efd98fb6249b1003 ]

One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex
before returning. Fix it to do this.

Note that this still doesn't get rid of the checker warning:

../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit

I think the interplay between the socket lock and the call's user_mutex may
be too complicated for checker to analyse, especially as
rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the
call's user_mutex if successful but unconditionally drops the socket lock.

Fixes: bdb5995807ae ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net_sched: reject TCF_EM_SIMPLE case for complex ematch module

[ Upstream commit 590216a95a339d74bc4762e6a9cb393909d7ce56 ]

When TCF_EM_SIMPLE was introduced, it is supposed to be convenient
for ematch implementation:

https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/

"You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE
set will simply result in allocating & copy. It's an optimization,
nothing more."

So if an ematch module provides ops->datalen that means it wants a
complex data structure (saved in its em->data) instead of a simple u32
value. We should simply reject such a combination, otherwise this u32
could be misinterpreted as a pointer.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com
Reported-by: Jun Nie <jun.nie@linaro.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mailbox: zynq-ipi: fix error handling while device_register() fails

[ Upstream commit 7457d21949961aa7dcbe885b2453f4d0dad35ad3 ]

If device_register() fails, it has two issues:
1. The name allocated by dev_set_name() is leaked.
2. The parent of device is not NULL, device_unregister() is called
in zynqmp_ipi_free_mboxes(), it will lead a kernel crash because
of removing not added device.

Call put_device() to give up the reference, so the name is freed in
kobject_cleanup(). Add device registered check in zynqmp_ipi_free_mboxes()
to avoid null-ptr-deref.

Fixes: 4a364c29c7e1 ("mailbox: ZynqMP IPI mailbox controller")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mailbox: arm_mhuv2: Fix return value check in mhuv2_probe()

[ Upstream commit a8f0e913639dd162e187261b9f279e1d1377670f ]

If devm_of_iomap() fails, it returns ERR_PTR() and never
return NULL, so replace NULL pointer check with IS_ERR()
to fix this problem.

Fixes: 2d8817363fe1 ("mailbox: arm_mhuv2: Add driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mailbox: mpfs: read the system controller's status

[ Upstream commit a575777b119d81400cdaeb359f62485f8f18058e ]

Some services explicitly return an error code in their response, but
others rely on the system controller to set a status in its status
register. The meaning of the bits varies based on what service is
requested, so pass it back up to the driver that requested the service
in the first place. The field in the message struct already existed, but
was unused until now.

If the system controller is busy, in which case we should never actually
be in the interrupt handler, or if the service fails the mailbox itself
should not be read. Callers should check the status before operating on
the response.

There's an existing, but unused, #define for the mailbox mask - but it
was incorrect. It was doing a GENMASK_ULL(32, 16) which should've just
been a GENMASK(31, 16), so fix that up and start using it.

Fixes: 2ff953ddb96d ("mbox: add polarfire soc system controller mailbox")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

skbuff: Account for tail adjustment during pull operations

[ Upstream commit b3f5a4f0eecacf1d4514f8e58c8d5b4e07a74399 ]

Extending the tail can have some unexpected side effects if a program uses
a helper like BPF_FUNC_skb_pull_data to read partial content beyond the
head skb headlen when all the skbs in the gso frag_list are linear with no
head_frag -

  kernel BUG at net/core/skbuff.c:4219!
  pc : skb_segment+0xcf4/0xd2c
  lr : skb_segment+0x63c/0xd2c
  Call trace:
   skb_segment+0xcf4/0xd2c
   __udp_gso_segment+0xa4/0x544
   udp4_ufo_fragment+0x184/0x1c0
   inet_gso_segment+0x16c/0x3a4
   skb_mac_gso_segment+0xd4/0x1b0
   __skb_gso_segment+0xcc/0x12c
   udp_rcv_segment+0x54/0x16c
   udp_queue_rcv_skb+0x78/0x144
   udp_unicast_rcv_skb+0x8c/0xa4
   __udp4_lib_rcv+0x490/0x68c
   udp_rcv+0x20/0x30
   ip_protocol_deliver_rcu+0x1b0/0x33c
   ip_local_deliver+0xd8/0x1f0
   ip_rcv+0x98/0x1a4
   deliver_ptype_list_skb+0x98/0x1ec
   __netif_receive_skb_core+0x978/0xc60

Fix this by marking these skbs as GSO_DODGY so segmentation can handle
the tail updates accordingly.

Fixes: ffb6eacc0ec6 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list")
Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com>
Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/1671084718-24796-1-git-send-email-quic_subashab@quicinc.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

devlink: protect devlink dump by the instance lock

[ Upstream commit 5f38d8ed4d14ce1f353f739d1e588890af74fd2e ]

Take the instance lock around devlink_nl_fill() when dumping,
doit takes it already.

We are only dumping basic info so in the worst case we were risking
data races around the reload statistics. Until the big devlink mutex
was removed all relevant code was protected by it, so the missing
instance lock was not exposed.

Fixes: 7400d837e401 ("net: devlink: remove devlink_mutex")
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20221216044122.1863550-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

arm64: dts: mt8183: Fix Mali GPU clock

[ Upstream commit 05e81feb3153b9d8d578ba3150837c3def6fca47 ]

The actual clock feeding into the Mali GPU on the MT8183 is from the
clock gate in the MFGCFG block, not CLK_TOP_MFGPLL_CK from the TOPCKGEN
block, which itself is simply a pass-through placeholder for the MFGPLL
in the APMIXEDSYS block.

Fix the hardware description with the correct clock reference.

Fixes: 60f64a3aec20 ("arm64: dts: mt8183: Add node for the Mali GPU")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Link: https://lore.kernel.org/r/20220927101128.44758-2-angelogioacchino.delregno@collabora.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

soc: mediatek: pm-domains: Fix the power glitch issue

[ Upstream commit 423cd95178410aaa96242c545c8403e77d0a4912 ]

Power reset maybe generate unexpected signal. In order to avoid
the glitch issue, we need to enable isolation first to guarantee the
stable signal when power reset is triggered.

Fixes: 75d995b42f01 ("soc: mediatek: Add MediaTek SCPSYS power domains")
Signed-off-by: Chun-Jie Chen <chun-jie.chen@mediatek.com>
Signed-off-by: Allen-KH Cheng <allen-kh.cheng@mediatek.com>
Reviewed-by: Chen-Yu Tsai <wenst@chromium.org>
Reviewed-by: Miles Chen <miles.chen@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/20221014102029.1162-1-allen-kh.cheng@mediatek.com
Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

openvswitch: Fix flow lookup to use unmasked key

[ Upstream commit bcdd08288440f183791d1feedc9ba8ec6fdc0549 ]

The commit mentioned below causes the ovs_flow_tbl_lookup() function
to be called with the masked key. However, it's supposed to be called
with the unmasked key. This due to the fact that the datapath supports
installing wider flows, and OVS relies on this behavior. For example
if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
128.0.0.0) is allowed to be added.

However, if we try to add a wildcard rule, the installation fails:

$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)

The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed.

This change reverses the commit below, but rather than having the key
on the stack, it's allocated.

Fixes: 67cebb18a580 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests: devlink: fix the fd redirect in dummy_reporter_test

[ Upstream commit 704bd33064029af3bc92c86cffc4288347e225e0 ]

$number + > bash means redirect FD $number, e.g. commonly
used 2> redirects stderr (fd 2). The test uses 8192> to
write the number 8192 to a file, this results in:

./devlink.sh: line 499: 8192: Bad file descriptor

Oddly the test also papers over this issue by checking
for failure (expecting an error rather than success)
so it passes, anyway.

Fixes: 3f87059948cc ("selftests: Add a test of large binary to devlink health test")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

devlink: hold region lock when flushing snapshots

[ Upstream commit cbb210e2b028f0a7b07696f5db8ace0308960e9e ]

Netdevsim triggers a splat on reload, when it destroys regions
with snapshots pending:

  WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140
  CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d6e1c #580
  RIP: 0010:devlink_region_snapshot_del+0x12e/0x140
  Call Trace:
   <TASK>
   devl_region_destroy+0x70/0x140
   nsim_dev_reload_down+0x2f/0x60 [netdevsim]
   devlink_reload+0x1f7/0x360
   devlink_nl_cmd_reload+0x6ce/0x860
   genl_family_rcv_msg_doit.isra.0+0x145/0x1c0

This is the locking assert in devlink_region_snapshot_del(),
we're supposed to be holding the region->snapshot_lock here.

Fixes: 5b9505d51a2b ("net: devlink: remove region snapshots list dependency on devlink->lock")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtc: mxc_v2: Add missing clk_disable_unprepare()

[ Upstream commit f6b90a21d8f665aafc51332ad26c7b15b5620f43 ]

The call to clk_disable_unprepare() is left out in the error handling of
devm_rtc_allocate_device. Add it back.

Fixes: 52e29de39fed ("rtc: mxc_v2: fix possible race condition")
Signed-off-by: GUO Zihua <guozihua@huawei.com>
Link: https://lore.kernel.org/r/20221122085046.21689-1-guozihua@huawei.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Set Qbv start_time and end_time to end_time if not being configured in GCL

[ Upstream commit caf68ee0ad698f3f038d6387ca2d8ab7fc872aba ]

The default setting of end_time minus start_time is whole 1 second.
Thus, if it's not being configured in any GCL entry then it will be
staying at original 1 second.

This patch is changing the start_time and end_time to be end_time as
if setting zero will be having weird HW behavior where the gate will
not be fully closed.

Fixes: e43ec0122e9e ("igc: Add support for taprio offloading")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: recalculate Qbv end_time by considering cycle time

[ Upstream commit 9d6d95a675e9f4491548cca8963cb3210c9a4e86 ]

Qbv users can specify a cycle time that is not equal to the total GCL
intervals. Hence, recalculation is necessary here to exclude the time
interval that exceeds the cycle time. As those GCL which exceeds the
cycle time will be truncated.

According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of
the list is reached, it will switch to the END_OF_CYCLE state and
leave the gates in the same state until the next cycle is started.

Fixes: e43ec0122e9e ("igc: Add support for taprio offloading")
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: allow BaseTime 0 enrollment for Qbv

[ Upstream commit 15a52ad3ba6dee5e83ab190097220a8615345b82 ]

Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off.
So this allow the BaseTime to enroll with zero value.

Fixes: 0caa2a690eec ("igc: Simplify TSN flags handling")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Add checking for basetime less than zero

[ Upstream commit 4a53250dc4cbb1028f035bef760ba2e01243a850 ]

Using the tc qdisc command, the user can set basetime to any value.
Checking should be done on the driver's side to prevent registering
basetime values that are less than zero.

Fixes: e43ec0122e9e ("igc: Add support for taprio offloading")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

igc: Use strict cycles for Qbv scheduling

[ Upstream commit 9d15e23e0d3be1405e4c6e28f62805f664185361 ]

Configuring strict cycle mode in the controller forces more well
behaved transmissions when taprio is offloaded.

When set this strict_cycle and strict_end, transmission is not
enabled if the whole packet cannot be completed before end of
the Qbv cycle.

Fixes: ed94c2313883 ("igc: Add support for ETF offloading")
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com>
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()

[ Upstream commit 42d5edf3bbde6206ddf6e39ecc3db18cd36fc3af ]

In the blamed commit, it was not noticed that one implementation of
chip->info->ops->phylink_get_caps(), called by mv88e6xxx_get_caps(),
may access hardware registers, and in doing so, it takes the
mv88e6xxx_reg_lock(). Namely, this is mv88e6352_phylink_get_caps().

This is a problem because mv88e6xxx_get_caps(), apart from being
a top-level function (method invoked by dsa_switch_ops), is now also
directly called from mv88e6xxx_setup_port(), which runs under the
mv88e6xxx_reg_lock() taken by mv88e6xxx_setup(). Therefore, when running
on mv88e6352, the reg_lock would be acquired a second time and the
system would deadlock on driver probe.

The things that mv88e6xxx_setup() can compete with in terms of register
access with are the IRQ handlers and MDIO bus operations registered by
mv88e6xxx_probe(). So there is a real need to acquire the register lock.

The register lock can, in principle, be dropped and re-acquired pretty
much at will within the driver, as long as no operations that involve
waiting for indirect access to complete (essentially, callers of
mv88e6xxx_smi_direct_wait() and mv88e6xxx_wait_mask()) are interrupted
with the lock released. However, I would guess that in mv88e6xxx_setup(),
the critical section is kept open for such a long time just in order to
optimize away multiple lock/unlock operations on the registers.

We could, in principle, drop the reg_lock right before the
mv88e6xxx_setup_port() -> mv88e6xxx_get_caps() call, and
re-acquire it immediately afterwards. But this would look ugly, because
mv88e6xxx_setup_port() would release a lock which it didn't acquire, but
the caller did.

A cleaner solution to this issue comes from the observation that struct
mv88e6xxxx_ops methods generally assume they are called with the
reg_lock already acquired. Whereas mv88e6352_phylink_get_caps() is more
the exception rather than the norm, in that it acquires the lock itself.

Let's enforce the same locking pattern/convention for
chip->info->ops->phylink_get_caps() as well, and make
mv88e6xxx_get_caps(), the top-level function, acquire the register lock
explicitly, for this one implementation that will access registers for
port 4 to work properly.

This means that mv88e6xxx_setup_port() will no longer call the top-level
function, but the low-level mv88e6xxx_ops method which expects the
correct calling context (register lock held).

Compared to chip->info->ops->phylink_get_caps(), mv88e6xxx_get_caps()
also fixes up the supported_interfaces bitmap for internal ports, since
that can be done generically and does not require per-switch knowledge.
That's code which will no longer execute, however mv88e6xxx_setup_port()
doesn't need that. It just needs to look at the mac_capabilities bitmap.

Fixes: e5795cdb6ed8 ("net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports")
Reported-by: Maksim Kiselev <bigunclemax@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Maksim Kiselev <bigunclemax@gmail.com>
Link: https://lore.kernel.org/r/20221214110120.3368472-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

r6040: Fix kmemleak in probe and remove

[ Upstream commit 2cfa9aa2a8e6f9c5737ea89963dd28f1ff689635 ]

There is a memory leaks reported by kmemleak:

  unreferenced object 0xffff888116111000 (size 2048):
    comm "modprobe", pid 817, jiffies 4294759745 (age 76.502s)
    hex dump (first 32 bytes):
      00 c4 0a 04 81 88 ff ff 08 10 11 16 81 88 ff ff  ................
      08 10 11 16 81 88 ff ff 00 00 00 00 00 00 00 00  ................
    backtrace:
      [<ffffffff815bcd82>] kmalloc_trace+0x22/0x60
      [<ffffffff827e20ee>] phy_device_create+0x4e/0x90
      [<ffffffff827e6072>] get_phy_device+0xd2/0x220
      [<ffffffff827e7844>] mdiobus_scan+0xa4/0x2e0
      [<ffffffff827e8be2>] __mdiobus_register+0x482/0x8b0
      [<ffffffffa01f5d24>] r6040_init_one+0x714/0xd2c [r6040]
      ...

The problem occurs in probe process as follows:
  r6040_init_one:
    mdiobus_register
      mdiobus_scan    <- alloc and register phy_device,
                         the reference count of phy_device is 3
    r6040_mii_probe
      phy_connect     <- connect to the first phy_device,
                         so the reference count of the first
                         phy_device is 4, others are 3
    register_netdev   <- fault inject succeeded, goto error handling path

    // error handling path
    err_out_mdio_unregister:
      mdiobus_unregister(lp->mii_bus);
    err_out_mdio:
      mdiobus_free(lp->mii_bus);    <- the reference count of the first
                                       phy_device is 1, it is not released
                                       and other phy_devices are released
  // similarly, the remove process also has the same problem

The root cause is traced to the phy_device is not disconnected when
removes one r6040 device in r6040_remove_one() or on error handling path
after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet
device is connected to the first PHY device of mii_bus, in order to
notify the connected driver when the link status changes, which is the
default behavior of the PHY infrastructure to handle everything.
Therefore the phy_device should be disconnected when removes one r6040
device or on error handling path.

Fix it by adding phy_disconnect() when removes one r6040 device or on
error handling path after r6040_mii probed successfully.

Fixes: 3b5d8ff40306 ("r6040: implement phylib")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()

[ Upstream commit 1be348619c5bd568989b0e43e66eef7bc089f67d ]

There is a race resulting in alive SOCK_SEQPACKET socket
may change its state from TCP_ESTABLISHED to TCP_CLOSE:

unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
  sock_orphan(peer)
    sock_set_flag(peer, SOCK_DEAD)
                                           sock_alloc_send_pskb()
                                             if !(sk->sk_shutdown & SEND_SHUTDOWN)
                                               OK
                                           if sock_flag(peer, SOCK_DEAD)
                                             sk->sk_state = TCP_CLOSE
  sk->sk_shutdown = SHUTDOWN_MASK

After that socket sk remains almost normal: it is able to connect, listen, accept
and recvmsg, while it can't sendmsg.

Since this is the only possibility for alive SOCK_SEQPACKET to change
the state in such way, we should better fix this strange and potentially
danger corner case.

Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb().
Originally used ECONNREFUSED looks strange, since it's strange to return
a specific retval in dependence of race in kernel, when user can't affect on this.

Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock
to fix race with unix_dgram_connect():

unix_dgram_connect(other)            unix_dgram_sendmsg(sk)
                                       unix_peer(sk) = NULL
                                       unix_state_unlock(sk)
  unix_state_double_lock(sk, other)
  sk->sk_state  = TCP_ESTABLISHED
  unix_peer(sk) = other
  unix_state_double_unlock(sk, other)
                                       sk->sk_state  = TCP_CLOSED

This patch fixes both of these races.

Fixes: ba888b301cfd ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
Signed-off-by: Kirill Tkhai <tkhai@ya.ru>
Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

nfc: pn533: Clear nfc_target before being used

[ Upstream commit 34192af78d54701d906f6609e6b8e4fd144c8cd0 ]

Fix a slab-out-of-bounds read that occurs in nla_put() called from
nfc_genl_send_target() when target->sensb_res_len, which is duplicated
from an nfc_target in pn533, is too large as the nfc_target is not
properly initialized and retains garbage values. Clear nfc_targets with
memset() before they are used.

Found by a modified version of syzkaller.

BUG: KASAN: slab-out-of-bounds in nla_put
Call Trace:
memcpy
nla_put
nfc_genl_dump_targets
genl_lock_dumpit
netlink_dump
__netlink_dump_start
genl_family_rcv_msg_dumpit
genl_rcv_msg
netlink_rcv_skb
genl_rcv
netlink_unicast
netlink_sendmsg
sock_sendmsg
____sys_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64

Fixes: 790a9190d3d4 ("NFC: pn533: Send ATR_REQ directly for active device detection")
Fixes: e98b9959c9df ("NFC: DEP link hook implementation for pn533")
Signed-off-by: Minsuk Kang <linuxlovemin@yonsei.ac.kr>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20221214015139.119673-1-linuxlovemin@yonsei.ac.kr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: enetc: avoid buffer leaks on xdp_do_redirect() failure

[ Upstream commit 1b427fbbdff060e42ba884178a8e68749e4e7add ]

Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software
BD in the RX ring between index orig_i and i can have one of 2 refcount
values on its page.

We are the owner of the current buffer that is being processed, so the
refcount will be at least 1.

If the current owner of the buffer at the diametrically opposed index
in the RX ring (i.o.w, the other half of this page) has not yet called
kfree(), this page's refcount could even be 2.

enetc_page_reusable() in enetc_flip_rx_buff() tests for the page
refcount against 1, and [ if it's 2 ] does not attempt to reuse it.

But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call,
the page refcount can have one of 3 values. It can also be 0, if there
is no owner of the other page half, and xdp_do_redirect() for this
buffer ran so far that it triggered a flush of the devmap/cpumap bulk
queue, and the consumers of those bulk queues also freed the buffer,
all by the time xdp_do_redirect() returns the execution back to enetc.

This is the reason why enetc_flip_rx_buff() is called before
xdp_do_redirect(), but there is a big flaw with that reasoning:
enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the
enetc_page_reusable() branch, and if xdp_do_redirect() returns an error,
we call enetc_xdp_free(), which does not deal gracefully with that.

In fact, what happens is quite special. The page refcounts start as 1.
enetc_flip_rx_buff() figures they're reusable, transfers these
rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and
bumps the refcount to 2. When xdp_do_redirect() later returns an error,
we call the no-op enetc_xdp_free(), but we still haven't lost the
reference to that page. A copy of it is still at rx_ring->next_to_alloc,
but that has refcount 2 (and there are no concurrent owners of it in
flight, to drop the refcount). What really kills the system is when
we'll flip the rx_swbd->page the second time around. With an updated
refcount of 2, the page will not be reusable and we'll really leak it.
Then enetc_new_page() will have to allocate more pages, which will then
eventually leak again on further errors from xdp_do_redirect().

The problem, summarized, is that we zeroize rx_swbd->page before we're
completely done with it, and this makes it impossible for the error path
to do something with it.

Since the packet is potentially multi-buffer and therefore the
rx_swbd->page is potentially an array, manual passing of the old
pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit
difficult.

For the sake of going with a simple solution, we accept the possibility
of racing with xdp_do_redirect(), and we move the flip procedure to
execute only on the redirect success path. By racing, I mean that the
page may be deemed as not reusable by enetc (having a refcount of 0),
but there will be no leak in that case, either.

Once we accept that, we have something better to do with buffers on
XDP_REDIRECT failure. Since we haven't performed half-page flipping yet,
we won't, either (and this way, we can avoid enetc_xdp_free()
completely, which gives the entire page to the slab allocator).
Instead, we'll call enetc_xdp_drop(), which will recycle this half of
the buffer back to the RX ring.

Fixes: 774a2709afbd ("net: enetc: add support for XDP_REDIRECT")
Suggested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

media: v4l2-ctrls-api.c: add back dropped ctrl->is_new = 1

[ Upstream commit 6e2d718ac395634685e242678856f555d0fe35b9 ]

The patch adding support for dynamically allocated arrays accidentally
dropped the line setting ctrl->is_new to 1, thus new string values were
always ignored.

Fixes: 67b8b9a85cda ("media: v4l2-ctrls: add support for dynamically allocated arrays.")
Reported-by: Alice Yuan <alice.yuan@nxp.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

bpf: prevent leak of lsm program after failed attach

[ Upstream commit 2c9c883a8fa2f78557a44931bee13ad3d0bc47e0 ]

In [0], we added the ability to bpf_prog_attach LSM programs to cgroups,
but in our validation to make sure the prog is meant to be attached to
BPF_LSM_CGROUP, we return too early if the check fails. This results in
lack of decrementing prog's refcnt (through bpf_prog_put)
leaving the LSM program alive past the point of the expected lifecycle.
This fix allows for the decrement to take place.

[0] https://lore.kernel.org/all/20220628174314.1216643-4-sdf@google.com/

Fixes: 633629fdfa3a ("bpf: per-cgroup lsm flavor")
Signed-off-by: Milan Landaverde <milan@mdaverde.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221213175714.31963-1-milan@mdaverde.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION

[ Upstream commit 78bc18ca05e3fa2f854d9774f6b5a322f7c493fa ]

BPF selftests require CONFIG_FUNCTION_ERROR_INJECTION to work. However,
CONFIG_FUNCTION_ERROR_INJECTION is no longer 'y' by default after recent
changes. As a result, we are seeing errors like the following from BPF CI:

   bpf_testmod_test_read() is not modifiable
   __x64_sys_setdomainname is not sleepable
   __x64_sys_getpgid is not sleepable

Fix this by explicitly selecting CONFIG_FUNCTION_ERROR_INJECTION in the
selftest config.

Fixes: 71be7156f240 ("error-injection: Add prompt for function error injection")
Reported-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20221213220500.3427947-1-song@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

block, bfq: fix possible uaf for 'bfqq->bic'

[ Upstream commit 4f2a4111753a51d36ed1f56198deef4041bbe093 ]

Our test report a uaf for 'bfqq->bic' in 5.10:

==================================================================
BUG: KASAN: use-after-free in bfq_select_queue+0x378/0xa30

CPU: 6 PID: 2318352 Comm: fsstress Kdump: loaded Not tainted 5.10.0-60.18.0.50.h602.kasan.eulerosv2r11.x86_64 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-20220320_160524-szxrtosci10000 04/01/2014
Call Trace:
bfq_select_queue+0x378/0xa30
bfq_dispatch_request+0xe8/0x130
blk_mq_do_dispatch_sched+0x62/0xb0
__blk_mq_sched_dispatch_requests+0x215/0x2a0
blk_mq_sched_dispatch_requests+0x8f/0xd0
__blk_mq_run_hw_queue+0x98/0x180
__blk_mq_delay_run_hw_queue+0x22b/0x240
blk_mq_run_hw_queue+0xe3/0x190
blk_mq_sched_insert_requests+0x107/0x200
blk_mq_flush_plug_list+0x26e/0x3c0
blk_finish_plug+0x63/0x90
__iomap_dio_rw+0x7b5/0x910
iomap_dio_rw+0x36/0x80
ext4_dio_read_iter+0x146/0x190 [ext4]
ext4_file_read_iter+0x1e2/0x230 [ext4]
new_sync_read+0x29f/0x400
vfs_read+0x24e/0x2d0
ksys_read+0xd5/0x1b0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x61/0xc6

Commit 33a73331322a ("bfq: Split shared queues on move between cgroups")
changes that move process to a new cgroup will allocate a new bfqq to
use, however, the old bfqq and new bfqq can point to the same bic:

1) Initial state, two process with io in the same cgroup.

Process 1       Process 2
(BIC1)          (BIC2)
  |  Λ            |  Λ
  |  |            |  |
  V  |            V  |
  bfqq1           bfqq2

2) bfqq1 is merged to bfqq2.

Process 1       Process 2
(BIC1)          (BIC2)
  |               |
   \-------------\|
                  V
  bfqq1           bfqq2(coop)

3) Process 1 exit, then issue new io(denoce IOA) from Process 2.

(BIC2)
  |  Λ
  |  |
  V  |
  bfqq2(coop)

4) Before IOA is completed, move Process 2 to another cgroup and issue io.

Process 2
(BIC2)
   Λ
   |\--------------\
   |                V
  bfqq2           bfqq3

Now that BIC2 points to bfqq3, while bfqq2 and bfqq3 both point to BIC2.
If all the requests are completed, and Process 2 exit, BIC2 will be
freed while there is no guarantee that bfqq2 will be freed before BIC2.

Fix the problem by clearing bfqq->bic while bfqq is detached from bic.

Fixes: 33a73331322a ("bfq: Split shared queues on move between cgroups")
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221214030430.3304151-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mISDN: hfcmulti: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave()

[ Upstream commit 0f0fb9c85901aa9cd372705879e900c719308d92 ]

It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.

skb_queue_purge() is called under spin_lock_irqsave() in handle_dmsg()
and hfcm_l1callback(), kfree_skb() is called in them, to fix this, use
skb_queue_splice_init() to move the dch->squeue to a free queue, also
enqueue the tx_skb and rx_skb, at last calling __skb_queue_purge() to
free the SKBs afer unlock.

Fixes: ec89bda12621 ("Add mISDN HFC multiport driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mISDN: hfcpci: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave()

[ Upstream commit c8d978749aee057217a94690c7a294f06bbf3c5c ]

It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.

skb_queue_purge() is called under spin_lock_irqsave() in hfcpci_l2l1D(),
kfree_skb() is called in it, to fix this, use skb_queue_splice_init()
to move the dch->squeue to a free queue, also enqueue the tx_skb and
rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock.

Fixes: f69abc4c0024 ("Add mISDN HFC PCI driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mISDN: hfcsusb: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave()

[ Upstream commit 20f58b80aac4ccb2c34138777f7849a39a5441b4 ]

It is not allowed to call kfree_skb() or consume_skb() from hardware
interrupt context or with hardware interrupts being disabled.

It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead.
The difference between them is free reason, dev_kfree_skb_irq() means
the SKB is dropped in error and dev_consume_skb_irq() means the SKB
is consumed in normal.

skb_queue_purge() is called under spin_lock_irqsave() in hfcusb_l2l1D(),
kfree_skb() is called in it, to fix this, use skb_queue_splice_init()
to move the dch->squeue to a free queue, also enqueue the tx_skb and
rx_skb, at last calling __skb_queue_purge() to free the SKBs afer unlock.

In tx_iso_complete(), dev_kfree_skb() is called to consume the transmitted
SKB, so replace it with dev_consume_skb_irq().

Fixes: a60b1807051f ("mISDN: Add HFC USB driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: do failover when high prio link up

[ Upstream commit 151fcbd5586066443073b0db7ce1ffc45d69e19e ]

Currently, when a high prio link enslaved, or when current link down,
the high prio port could be selected. But when high prio link up, the
new active slave reselection is not triggered. Fix it by checking link's
prio when getting up. Making the do_failover after looping all slaves as
there may be multi high prio slaves up.

Reported-by: Liang Li <liali@redhat.com>
Fixes: 60e0039b48c4 ("Bonding: add per-port priority for failover re-selection")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

bonding: add missed __rcu annotation for curr_active_slave

[ Upstream commit 03e8965c320e1f9fd6e9bbf405b15898da00c51a ]

There is one direct accesses to bond->curr_active_slave in
bond_miimon_commit(). Protected it by rcu_access_pointer()
since the later of this function also use this one.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stable-dep-of: 151fcbd55860 ("bonding: do failover when high prio link up")
Signed-off-by: Sasha Levin <sashal@kernel.org>

net: macsec: fix net device access prior to holding a lock

[ Upstream commit 6948e8493756bd0036aba3f523a8a29ed4849270 ]

Currently macsec offload selection update routine accesses
the net device prior to holding the relevant lock.
Fix by holding the lock prior to the device access.

Fixes: f78edd6a87dc ("net: macsec: add nla support for changing the offloading selection")
Reviewed-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Link: https://lore.kernel.org/r/20221211075532.28099-1-ehakim@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure

[ Upstream commit b191a2e9acc539eca46c2bdde32f149d8ba5d428 ]

On error situation `clp->cl_cb_conn.cb_xprt` should not be given
a reference to the xprt otherwise both client cleanup and the
error handling path of the caller call to put it. Better to
delay handing over the reference to a later branch.

[   72.530665] refcount_t: underflow; use-after-free.
[   72.531933] WARNING: CPU: 0 PID: 173 at lib/refcount.c:28 refcount_warn_saturate+0xcf/0x120
[   72.533075] Modules linked in: nfsd(OE) nfsv4(OE) nfsv3(OE) nfs(OE) lockd(OE) compat_nfs_ssc(OE) nfs_acl(OE) rpcsec_gss_krb5(OE) auth_rpcgss(OE) rpcrdma(OE) dns_resolver fscache netfs grace rdma_cm iw_cm ib_cm sunrpc(OE) mlx5_ib mlx5_core mlxfw pci_hyperv_intf ib_uverbs ib_core xt_MASQUERADE nf_conntrack_netlink nft_counter xt_addrtype nft_compat br_netfilter bridge stp llc nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set overlay nf_tables nfnetlink crct10dif_pclmul crc32_pclmul ghash_clmulni_intel xfs serio_raw virtio_net virtio_blk net_failover failover fuse [last unloaded: sunrpc]
[   72.540389] CPU: 0 PID: 173 Comm: kworker/u16:5 Tainted: G           OE     5.15.82-dan #1
[   72.541511] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+1084+97b81f61 04/01/2014
[   72.542717] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[   72.543575] RIP: 0010:refcount_warn_saturate+0xcf/0x120
[   72.544299] Code: 55 00 0f 0b 5d e9 01 50 98 00 80 3d 75 9e 39 08 00 0f 85 74 ff ff ff 48 c7 c7 e8 d1 60 8e c6 05 61 9e 39 08 01 e8 f6 51 55 00 <0f> 0b 5d e9 d9 4f 98 00 80 3d 4b 9e 39 08 00 0f 85 4c ff ff ff 48
[   72.546666] RSP: 0018:ffffb3f841157cf0 EFLAGS: 00010286
[   72.547393] RAX: 0000000000000026 RBX: ffff89ac6231d478 RCX: 0000000000000000
[   72.548324] RDX: ffff89adb7c2c2c0 RSI: ffff89adb7c205c0 RDI: ffff89adb7c205c0
[   72.549271] RBP: ffffb3f841157cf0 R08: 0000000000000000 R09: c0000000ffefffff
[   72.550209] R10: 0000000000000001 R11: ffffb3f841157ad0 R12: ffff89ac6231d180
[   72.551142] R13: ffff89ac6231d478 R14: ffff89ac40c06180 R15: ffff89ac6231d4b0
[   72.552089] FS:  0000000000000000(0000) GS:ffff89adb7c00000(0000) knlGS:0000000000000000
[   72.553175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   72.553934] CR2: 0000563a310506a8 CR3: 0000000109a66000 CR4: 0000000000350ef0
[   72.554874] Call Trace:
[   72.555278]  <TASK>
[   72.555614]  svc_xprt_put+0xaf/0xe0 [sunrpc]
[   72.556276]  nfsd4_process_cb_update.isra.11+0xb7/0x410 [nfsd]
[   72.557087]  ? update_load_avg+0x82/0x610
[   72.557652]  ? cpuacct_charge+0x60/0x70
[   72.558212]  ? dequeue_entity+0xdb/0x3e0
[   72.558765]  ? queued_spin_unlock+0x9/0x20
[   72.559358]  nfsd4_run_cb_work+0xfc/0x270 [nfsd]
[   72.560031]  process_one_work+0x1df/0x390
[   72.560600]  worker_thread+0x37/0x3b0
[   72.561644]  ? process_one_work+0x390/0x390
[   72.562247]  kthread+0x12f/0x150
[   72.562710]  ? set_kthread_struct+0x50/0x50
[   72.563309]  ret_from_fork+0x22/0x30
[   72.563818]  </TASK>
[   72.564189] ---[ end trace 031117b1c72ec616 ]---
[   72.566019] list_add corruption. next->prev should be prev (ffff89ac4977e538), but was ffff89ac4763e018. (next=ffff89ac4763e018).
[   72.567647] ------------[ cut here ]------------

Fixes: 29c16c9b2afb ("nfsd: Fix svc_xprt refcnt leak when setup callback client failed")
Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Cc: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/mediatek: Fix forever loop in error handling

[ Upstream commit 26e9ef4567354d0abc3dd60e3b8b0f51dc2f2a4f ]

There is a typo so this loop does i++ where i-- was intended. It will
result in looping until the kernel crashes.

Fixes: bae7e5700633 ("iommu/mediatek: Add error path for loop of mm_dts_parse")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Yong Wu <yong.wu@mediatek.com>
Link: https://lore.kernel.org/r/Y5C3mTam2nkbaz6o@kili
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtc: pcf85063: fix pcf85063_clkout_control

[ Upstream commit 35d44410eb2bb2929663c65f479b1bda59d7dc40 ]

pcf85063_clkout_control reads the wrong register but then update the
correct one.

Reported-by: Janne Terho <janne.terho@ouman.fi>
Fixes: a8fe2ecb9f58 ("rtc: pcf85063: Add pcf85063 clkout control to common clock framework")
Link: https://lore.kernel.org/r/20221211223553.59955-1-alexandre.belloni@bootlin.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtc: pic32: Move devm_rtc_allocate_device earlier in pic32_rtc_probe()

[ Upstream commit 471b601db623e40de5b79b0d0398ddde9257575d ]

The pic32_rtc_enable(pdata, 0) and clk_disable_unprepare(pdata->clk)
should be called in the error handling of devm_rtc_allocate_device(),
so we should move devm_rtc_allocate_device earlier in pic32_rtc_probe()
to fix it.

Fixes: c3b00b0cd3a9 ("rtc: pic32: convert to devm_rtc_allocate_device")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://lore.kernel.org/r/20221123015953.1998521-1-cuigaosheng1@huawei.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

rtc: st-lpc: Add missing clk_disable_unprepare in st_rtc_probe()

[ Upstream commit d67c4dbd7f04b51f8e1315f6f7c2f7e009c5496e ]

The clk_disable_unprepare() should be called in the error handling
of clk_get_rate(), fix it.

Fixes: c80d01112db1 ("rtc: st: Add new driver for ST's LPC RTC")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Link: https://lore.kernel.org/r/20221123014805.1993052-1-cuigaosheng1@huawei.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

netfilter: flowtable: really fix NAT IPv6 offload

[ Upstream commit 096fdb7763740203029b0954aa3fad04a201265b ]

The for-loop was broken from the start. It translates to:

for (i = 0; i < 4; i += 4)

which means the loop statement is run only once, so only the highest
32-bit of the IPv6 address gets mangled.

Fix the loop increment.

Fixes: 9a5b3fe458b6 ("netfilter: flowtable: fix NAT IPv6 offload mangling")
Fixes: c7367e66e8d4 ("netfilter: nf_flow_table_offload: add IPv6 support")
Signed-off-by: Qingfang DENG <dqfext@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

mfd: pm8008: Fix return value check in pm8008_probe()

[ Upstream commit 4c017bc0e6cacd6a7d617cdd88e121a3defbea4d ]

In case of error, the function devm_regmap_init_i2c() returns
ERR_PTR() and never returns NULL. The NULL test in the return
value check should be replaced with IS_ERR().

Fixes: 80b23df0a5f6 ("mfd: pm8008: Add driver for QCOM PM8008 PMIC")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Guru Das Srinagesh <gurus@codeaurora.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20221125073626.1868229-1-yangyingliang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

mfd: qcom_rpm: Fix an error handling path in qcom_rpm_probe()

[ Upstream commit 497ef6f6bece66303334ccafd24b17a17c0db166 ]

If an error occurs after the clk_prepare_enable() call, a corresponding
clk_disable_unprepare() should be called.

Simplify code and switch to devm_clk_get_enabled() to fix it.

Fixes: 21dbf56e61d3 ("mfd: qcom_rpm: Handle message RAM clock")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/e39752476d02605b2be46cab7115f71255ce13a8.1668949256.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Sasha Levin <sashal@kernel.org>

mfd: bd957x: Fix Kconfig dependency on REGMAP_IRQ

[ Upstream commit 27774ddb1de14130980c55c98774eafbe204a3cc ]

The BD957x driver uses REGMAP_IRQ but does not 'select' to depend on
it. This can cause build failures. Select REGMAP_IRQ for BD957X.

Fixes: 72f9bf98c2c2 ("mfd: bd9576: Add IRQ support")
Signed-off-by: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/Y3SdCWkRr1L64SWK@dc75zzyyyyyyyyyyyyydt-3.rev.dnainternet.fi
Signed-off-by: Sasha Levin <sashal@kernel.org>

mfd: axp20x: Do not sleep in the power off handler

[ Upstream commit ecbeba728780085d77736f544866e6bde632dfed ]

Since commit 257b10e57045 ("ARM: Use do_kernel_power_off()"), the
function axp20x_power_off() now runs inside a RCU read-side critical
section, so it is not allowed to call msleep(). Use mdelay() instead.

Fixes: 257b10e57045 ("ARM: Use do_kernel_power_off()")
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20221105212909.6526-1-samuel@sholland.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

dt-bindings: mfd: qcom,spmi-pmic: Drop PWM reg dependency

[ Upstream commit 9529874f7038e8a2ac41a2fb9de0821562624548 ]

The PWM node is not a separate device and is expected to be part of parent
SPMI PMIC node, thus it obtains the address space from the parent. One IO
address in "reg" is also not correct description because LPG block maps to
several regions.

Fixes: f819c993f2a7 ("dt-bindings: mfd: convert to yaml Qualcomm SPMI PMIC")
Suggested-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org>
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20220928000517.228382-2-bryan.odonoghue@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/pseries/eeh: use correct API for error log size

[ Upstream commit 1fc16292dcea1eebf55563ef4fb6c74a440d2700 ]

rtas-error-log-max is not the name of an RTAS function, so rtas_token()
is not the appropriate API for retrieving its value. We already have
rtas_get_error_log_max() which returns a sensible value if the property
is absent for any reason, so use that instead.

Fixes: b1b323c7de0b ("powerpc/eeh: pseries platform EEH error log retrieval")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
[mpe: Drop no-longer possible error handling as noticed by ajd]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221118150751.469393-6-nathanl@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom: q6v5: Fix missing clk_disable_unprepare() in q6v5_wcss_qcs404_power_on()

[ Upstream commit fbfce6c61963457a729e584e03441879687f19fb ]

q6v5_wcss_qcs404_power_on() have no fail path for readl_poll_timeout().
Add fail path for readl_poll_timeout().

Fixes: dfacd742218c ("remoteproc: qcom: wcss: Add non pas wcss Q6 support for QCS404")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221204082757.18850-1-shangxiaojing@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom_q6v5_pas: Fix missing of_node_put() in adsp_alloc_memory_region()

[ Upstream commit a048e5be20d544c5343dc546b8cfe40ffa015704 ]

The pointer node is returned by of_parse_phandle() with refcount
incremented. We should use of_node_put() on it when done.

Fixes: e8ba7e031c3a ("remoteproc: Introduce Qualcomm ADSP PIL")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221203070639.15128-1-yuancan@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom_q6v5_pas: detach power domains on remove

[ Upstream commit c50fc9b009dbb36c354bcfcc9401dbf36d3e36e7 ]

We need to detach from the power domains also on remove, not just on
probe fail so a subsequent probe works as expected.

Otherwise the following error appears on re-probe:

[   29.452005] sysfs: cannot create duplicate filename '/devices/genpd:0:3000000.remoteproc'
[   29.477121] CPU: 1 PID: 483 Comm: sh Tainted: G        W          6.1.0-rc4-00075-g71a113770bda #78
[   29.510319] Hardware name: Fairphone 4 (DT)
[   29.538335] Call trace:
[   29.564470]  dump_backtrace.part.0+0xe0/0xf0
[   29.592602]  show_stack+0x18/0x30
[   29.619616]  dump_stack_lvl+0x64/0x80
[   29.646834]  dump_stack+0x18/0x34
[   29.673541]  sysfs_warn_dup+0x60/0x7c
[   29.700592]  sysfs_create_dir_ns+0xec/0x110
[   29.728057]  kobject_add_internal+0xb8/0x374
[   29.755530]  kobject_add+0x9c/0x104
[   29.782072]  device_add+0xbc/0x8a0
[   29.808445]  device_register+0x20/0x30
[   29.835175]  genpd_dev_pm_attach_by_id+0xa4/0x190
[   29.862851]  genpd_dev_pm_attach_by_name+0x3c/0xb0
[   29.890472]  dev_pm_domain_attach_by_name+0x20/0x30
[   29.918212]  adsp_probe+0x278/0x580
[   29.944384]  platform_probe+0x68/0xc0
[   29.970603]  really_probe+0xbc/0x2dc
[   29.996662]  __driver_probe_device+0x78/0xe0
[   30.023491]  device_driver_attach+0x48/0xac
[   30.050215]  bind_store+0xb8/0x114
[   30.075957]  drv_attr_store+0x24/0x3c
[   30.101874]  sysfs_kf_write+0x44/0x54
[   30.127751]  kernfs_fop_write_iter+0x120/0x1f0
[   30.154448]  vfs_write+0x1ac/0x380
[   30.179937]  ksys_write+0x70/0x104
[   30.205274]  __arm64_sys_write+0x1c/0x2c
[   30.231060]  invoke_syscall+0x48/0x114
[   30.256594]  el0_svc_common.constprop.0+0x44/0xec
[   30.283183]  do_el0_svc+0x2c/0xd0
[   30.308320]  el0_svc+0x2c/0x84
[   30.333059]  el0t_64_sync_handler+0xf4/0x120
[   30.359001]  el0t_64_sync+0x18c/0x190
[   30.384385] kobject_add_internal failed for genpd:0:3000000.remoteproc with -EEXIST, don't try to register things with the same name in the same directory.
[   30.406029] remoteproc remoteproc0: releasing 3000000.remoteproc
[   30.416064] qcom_q6v5_pas: probe of 3000000.remoteproc failed with error -17

Fixes: 5353adf6979f ("remoteproc: qcom: pas: Vote for active/proxy power domains")
Reviewed-by: Sibi Sankar <quic_sibis@quicinc.com>
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221118090816.100012-2-luca.weiss@fairphone.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom_q6v5_pas: disable wakeup on probe fail or remove

[ Upstream commit ef2cc7c6abe79f927e329e72827bb6b8ebb5689f ]

Leaving wakeup enabled during probe fail (-EPROBE_DEFER) or remove makes
the subsequent probe fail.

[    3.749454] remoteproc remoteproc0: releasing 3000000.remoteproc
[    3.752949] qcom_q6v5_pas: probe of 3000000.remoteproc failed with error -17
[    3.878935] remoteproc remoteproc0: releasing 4080000.remoteproc
[    3.887602] qcom_q6v5_pas: probe of 4080000.remoteproc failed with error -17
[    4.319552] remoteproc remoteproc0: releasing 8300000.remoteproc
[    4.332716] qcom_q6v5_pas: probe of 8300000.remoteproc failed with error -17

Fix this by disabling wakeup in both cases so the driver can properly
probe on the next try.

Fixes: 913fd8a0bd99 ("remoteproc: core: Prevent system suspend during remoteproc recovery")
Fixes: f0c05f62dbcc ("remoteproc: qcom: pas: Mark devices as wakeup capable")
Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com>
Signed-off-by: Luca Weiss <luca.weiss@fairphone.com>
Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org>
Reviewed-by: Sibi Sankar <quic_sibis@quicinc.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221118090816.100012-1-luca.weiss@fairphone.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: qcom: q6v5: Fix potential null-ptr-deref in q6v5_wcss_init_mmio()

[ Upstream commit 152382589858d68a112230b379b7583dc80924bd ]

q6v5_wcss_init_mmio() will call platform_get_resource_byname() that may
fail and return NULL. devm_ioremap() will use res->start as input, which
may causes null-ptr-deref. Check the ret value of
platform_get_resource_byname() to avoid the null-ptr-deref.

Fixes: dfacd742218c ("remoteproc: qcom: wcss: Add non pas wcss Q6 support for QCS404")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221125021641.29392-1-shangxiaojing@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

remoteproc: sysmon: fix memory leak in qcom_add_sysmon_subdev()

[ Upstream commit af7df92183fe81c5d7eec3f4a15e99afcac98042 ]

The kfree() should be called when of_irq_get_byname() fails or
devm_request_threaded_irq() fails in qcom_add_sysmon_subdev(),
otherwise there will be a memory leak, so add kfree() to fix it.

Fixes: 76780758a927 ("remoteproc: qcom: Add shutdown-ack irq")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
Link: https://lore.kernel.org/r/20221129105650.1539187-1-cuigaosheng1@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

RISC-V: KVM: Fix reg_val check in kvm_riscv_vcpu_set_reg_config()

[ Upstream commit b75b7ec3f6982636d2c0084df749592a1b17428e ]

The reg_val check in kvm_riscv_vcpu_set_reg_config() should only
be done for isa config register.

Fixes: 0aec3d416874 ("RISC-V: KVM: Improve ISA extension by using a bitmap")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

pwm: mediatek: always use bus clock for PWM on MT7622

[ Upstream commit b37e130d40246144b57cb7db09f70590a37ff595 ]

According to MT7622 Reference Manual for Development Board v1.0 the PWM
unit found in the MT7622 SoC also comes with the PWM_CK_26M_SEL register
at offset 0x210 just like other modern MediaTek ARM64 SoCs.
And also MT7622 sets that register to 0x00000001 on reset which is
described as 'Select 26M fix CLK as BCLK' in the datasheet.
Hence set has_ck_26m_sel to true also for MT7622 which results in the
driver writing 0 to the PWM_CK_26M_SEL register which is described as
'Select bus CLK as BCLK'.

Fixes: 79061647f6dc76 ("pwm: mediatek: Always use bus clock")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/Y1iF2slvSblf6bYK@makrotopia.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

pwm: mtk-disp: Fix the parameters calculated by the enabled flag of disp_pwm

[ Upstream commit d46d58b7e28cd29205b0e83372437ac2dbc12fad ]

In the original mtk_disp_pwm_get_state() function wrongly uses bit 0 of
CON0 to judge if the PWM is enabled.
However that is indicated by a bit (at a machine dependent position) in
the DISP_PWM_EN register. Fix this accordingly.

Fixes: a00dbb80b81a ("pwm: mtk-disp: Implement atomic API .get_state()")
Signed-off-by: xinlei lee <xinlei.lee@mediatek.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Link: https://lore.kernel.org/r/1666172538-11652-1-git-send-email-xinlei.lee@mediatek.com
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

pwm: sifive: Call pwm_sifive_update_clock() while mutex is held

[ Upstream commit 9f5601ec0e33c4ec447e518a6d8e07ab82b4b022 ]

As was documented in commit e0ca9013058e ("pwm: sifive: Reduce time the
controller lock is held") a caller of pwm_sifive_update_clock() must
hold the mutex. So fix pwm_sifive_clock_notifier() to grab the lock.

While this necessity was only documented later, the race exists since
the driver was introduced.

Fixes: 1afcd3819818 ("pwm: sifive: Add a driver for SiFive SoC PWM")
Reported-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Link: https://lore.kernel.org/r/20221018061656.1428111-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/sun50i: Remove IOMMU_DOMAIN_IDENTITY

[ Upstream commit 45fa805abd3ef8f0831332fa8e0a1a93518aa5be ]

This driver treats IOMMU_DOMAIN_IDENTITY the same as UNMANAGED, which
cannot possibly be correct.

UNMANAGED domains are required to start out blocking all DMAs. This seems
to be what this driver does as it allocates a first level 'dt' for the IO
page table that is 0 filled.

Thus UNMANAGED looks like a working IO page table, and so IDENTITY must be
a mistake. Remove it.

Fixes: b52188050f20 ("iommu: Add Allwinner H6 IOMMU driver")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0-v1-97f0adf27b5e+1f0-s50_identity_jgg@nvidia.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/mediatek: Validate number of phandles associated with "mediatek,larbs"

[ Upstream commit d78344cb445e4a8321805b61311df626042effa4 ]

Fix the smatch warnings:
drivers/iommu/mtk_iommu.c:878 mtk_iommu_mm_dts_parse() error: uninitialized
symbol 'larbnode'.

If someone abuse the dtsi node(Don't follow the definition of dt-binding),
for example "mediatek,larbs" is provided as boolean property, "larb_nr"
will be zero and cause abnormal.

To fix this problem and improve the code safety, add some checking
for the invalid input from dtsi, e.g. checking the larb_nr/larbid valid
range, and avoid "mediatek,larb-id" property conflicts in the smi-larb
nodes.

Fixes: 6cd178f40f78 ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20221018024258.19073-5-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/mediatek: Add error path for loop of mm_dts_parse

[ Upstream commit bae7e5700633e7ff6ba337cf8d8af22af3f400a8 ]

The mtk_iommu_mm_dts_parse will parse the smi larbs nodes. if the i+1
larb is parsed fail, we should put_device for the i..0 larbs.

There are two places need to comment:
1) The larbid may be not linear mapping, we should loop whole
   the array in the error path.
2) I move this line position: "data->larb_imu[id].dev = &plarbdev->dev;"
   before "if (!plarbdev->dev.driver)", That means set
   data->larb_imu[id].dev before the error path. then we don't need
   "platform_device_put(plarbdev)" again in probe_defer case. All depend
   on "put_device" of the error path in error cases.

Fixes: 6cd178f40f78 ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20221018024258.19073-4-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/mediatek: Use component_match_add

[ Upstream commit c2b612bddc6fe48b9c3e1b684a52c5da923eb4b7 ]

In order to simplify the error patch(avoid call of_node_put), Use
component_match_add instead component_match_add_release since we are only
interested in the "device" here. Then we could always call of_node_put in
normal path.

Strictly this is not a fixes patch, but it is a prepare for adding the
error path, thus I add a Fixes tag too.

Fixes: 6cd178f40f78 ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20221018024258.19073-3-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

iommu/mediatek: Add platform_device_put for recovering the device refcnt

[ Upstream commit d738bd15346bca4fda06606923a40a1efac59217 ]

Add platform_device_put to match with of_find_device_by_node.

Meanwhile, I add a new variable "pcommdev" which is for smi common device.
Otherwise, "platform_device_put(plarbdev)" for smi-common dev may be not
readable. And add a checking for whether pcommdev is NULL.

Fixes: 6cd178f40f78 ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com>
Link: https://lore.kernel.org/r/20221018024258.19073-2-yong.wu@mediatek.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>

selftests/powerpc: Fix resource leaks

[ Upstream commit e6ef3d914b7bc99e7ab07232437ffd190626a340 ]

In check_all_cpu_dscr_defaults, opendir() opens the directory stream.
Add missing closedir() in the error path to release it.

In check_cpu_dscr_default, open() creates an open file descriptor.
Add missing close() in the error path to release it.

Fixes: 5aa23ffd02f7 ("selftests/powerpc: Add test for all DSCR sysfs interfaces")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221205084429.570654-1-linmq006@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/hv-gpci: Fix hv_gpci event list

[ Upstream commit 85c0ad890c48b4abeb5215aa00b251452dfa690f ]

Based on getPerfCountInfo v1.018 documentation, some of the
hv_gpci events were deprecated for platform firmware that
supports counter_info_version 0x8 or above.

Fix the hv_gpci event list by adding a new attribute group
called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6"
macro to enable these events for platform firmware
that supports counter_info_version 0x6 or below. And assigning
the hv_gpci event list based on output counter info version
of underlying plaform.

Fixes: 94851814643f ("powerpc/perf/hv-gpci: add the remaining gpci requests")
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in of_fsl_spi_probe()

[ Upstream commit 93e91a658ec54c29f8909c8e8a0e68fb935c2989 ]

If platform_device_add() is not called or failed, it can not call
platform_device_del() to clean up memory, it should call
platform_device_put() in error case.

Fixes: 4432747d5443 ("[POWERPC] fsl_soc: add support for fsl_spi")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221029111626.429971-1-yangyingliang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/perf: callchain validate kernel stack pointer bounds

[ Upstream commit d2af28fb512ad305d622ee28da56902574f441e1 ]

The interrupt frame detection and loads from the hypothetical pt_regs
are not bounds-checked. The next-frame validation only bounds-checks
STACK_FRAME_OVERHEAD, which does not include the pt_regs. Add another
test for this.

The user could set r1 to be equal to the address matching the first
interrupt frame - STACK_INT_FRAME_SIZE, which is in the previous page
due to the kernel redzone, and induce the kernel to load the marker from
there. Possibly this could cause a crash at least. If the user could
induce the previous page to contain a valid marker, then it might be
able to direct perf to read specific memory addresses in a way that
could be transmitted back to the user in the perf data.

Fixes: 7317f1d80a88 ("perf_counter: powerpc: Add callchain support")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221127124942.1665522-4-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc: dts: turris1x.dts: Add channel labels for temperature sensor

[ Upstream commit 27fb47cad114d8e4002b4c8d03878df30b7f250a ]

Channel 0 of SA56004ED chip refers to internal SA56004ED chip sensor (chip
itself is located on the board) and channel 1 of SA56004ED chip refers to
external sensor which is connected to temperature diode of the P2020 CPU.

Fixes: 395b6dc8b931 ("powerpc: dts: Add DTS file for CZ.NIC Turris 1.x routers")
Signed-off-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Marek Behún <kabel@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220930123901.10251-1-pali@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>

kprobes: Fix check for probe enabled in kill_kprobe()

[ Upstream commit 185c2280c39805f69a7f9cc7b215770d3d0c4035 ]

In kill_kprobe(), the check whether disarm_kprobe_ftrace() needs to be
called always fails. This is because before that we set the
KPROBE_FLAG_GONE flag for kprobe so that "!kprobe_disabled(p)" is always
false.

The disarm_kprobe_ftrace() call introduced by commit:

ca5b77d4fbb4 ("kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler")

to fix the NULL pointer reference problem. When the probe is enabled, if
we do not disarm it, this problem still exists.

Fix it by putting the probe enabled check before setting the
KPROBE_FLAG_GONE flag.

Link: https://lore.kernel.org/all/20221126114316.201857-1-lihuafei1@huawei.com/
Fixes: 0757e4223f056 ("kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/pseries: fix plpks_read_var() code for different consumers

[ Upstream commit 1c032cead599d765aec9a2e80669ee468da4c9a2 ]

Even though plpks_read_var() is currently called to read variables
owned by different consumers, it internally supports only OS consumer.

Fix plpks_read_var() to handle different consumers correctly.

Fixes: e82925ad5c67 ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-7-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/pseries: Return -EIO instead of -EINTR for H_ABORTED error

[ Upstream commit f5f8231d6a42fff56a2433114856129f909a1f94 ]

Some commands for eg. "cat" might continue to retry on encountering
EINTR. This is not expected for original error code H_ABORTED.

Map H_ABORTED to more relevant Linux error code EIO.

Fixes: e82925ad5c67 ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-4-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>

powerpc/pseries: Fix the H_CALL error code in PLPKS driver

[ Upstream commit c278814767750905bb228a0a0d68645e8a24a629 ]

PAPR Spec defines H_P1 actually as H_PARAMETER and maps H_ABORTED to
a different numerical value.

Fix the error codes as per PAPR Specification.

Fixes: e82925ad5c67 ("powerpc/pseries: define driver for Platform KeyStore")
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221106205839.600442-3-nayna@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>